Image data delivery

ABSTRACT

A conversion apparatus is connected to a camera server delivering image information obtained by image sensing to a display operation terminal. When image information is received from the camera server, the conversion apparatus converts the received image information to a format for a portable display terminal, and delivers it to the portable display terminal.

FIELD OF THE INVENTION

[0001] The present invention relates to a technique for altering imagedata provided from an image sensing apparatus connected to a network anddelivering the image data to image display apparatuses.

BACKGROUND OF THE INVENTION

[0002] A camera providing a function enabling an image from the cameralocated at a remote site to be viewed via a network such as internet isdisclosed in Japanese Patent Laid-Open No. 10-040185. Hereinafter, sucha camera having a networking function is referred to as a camera serverapparatus. In the example of conventional technique, an image from thecamera server apparatus can be viewed simultaneously at a plurality ofterminal apparatuses such as personal computers, and in addition, thepan and/or tilt angles and the zoom ratio of the camera can becontrolled from remote locations using a plurality of terminalapparatuses.

[0003] In the case where a plurality of terminal apparatuses are allowedto control one camera in such a camera server apparatus system allowingthe camera to be controlled, the right to control only one physicallyavailable camera should be mediated. For this, if a concept of controlright disclosed in Japanese Patent Laid-Open No. 10-042278 isintroduced, the user can control the camera only during a period overwhich he or she has the control right. On the other hand, a technique ofsuperimposing information on an image from this camera server apparatusis disclosed in Japanese Patent Laid-Open No. 11-196404.

[0004] In recent years, due to advancement in technology of cellularphones and portable terminals, there arises the possibility that cameraimages can be viewed and manipulated from the above apparatuses.However, if the image from the camera server apparatus is to be providednot only to the terminal of the personal computer or the like but alsoto the portable terminal of the cellular phone or the like, the cameraserver apparatus needs to have two interfaces for both of theseterminals because the portable terminal is different from the terminalof the personal computer or the like in image providing scheme, imageformat and the like. As a result, the cost of the camera serverapparatus is increased. Similarly, a dedicated interface for control thecamera from the portable terminal should be provided separately on thecamera server apparatus side, resulting in increased complexity and costof the camera server apparatus.

[0005] In addition, an advertisement can not be flexibly superimposed onthe image on the camera server apparatus that does not have a functionto superimpose an advertisement and the like on the image. If the volumeof information to be superimposed on the image is considerably high,retaining superimposed information in the camera server apparatus is afunction different from the original function for delivering an image,and thus superimposition of information is not feasible in terms ofcost. Furthermore, it is impossible in the conventional technique tosuperimpose advertisement information on an image to be provided to thecellular phone while superimposing no advertisement information on animage to be provided to the conventional terminal, for example.

[0006] In addition, the technique in which a camera located at a remotesite is controlled via a network to obtain and display an image ischaracterized in high degree of freedom as to camera control such aspan, tilt, zoom and backlight correction of the camera. In addition, thetelevision conference system in which images and voices at a pluralityof sites are sent and received via a network with the image and thevoice combined together as a pair is generally used. In addition, thetechnique in which the image and sound are played back while they aredownloaded via a network is called streaming, and the live deliverytechnique in which the coding, network delivery, reception and playbackof the image and sound are performed at a time is used.

[0007] As for the matching of the image with voice, an image sensingapparatus outputting the image and sound with camera parameters matchedwith sound is described in Japanese Patent Laid-Open No. 11-305318. Inaddition, an apparatus selecting and outputting the image and sound isdescribed in Japanese Patent Laid-Open No. 08-56326. In addition, anexample of the television conference system in which a plurality ofsites are connected together, and the switching is made between theimage and voice to be used is disclosed in Japanese Patent Laid-Open No.10-93941.

[0008] In a so-called web camera in which a camera located at a remotesite is controlled via a network, only the image can be obtained, and nosound is obtained in general. On the other hand, the televisionconference system allows to send/receive the image and voice in additionto camera control, but employs a method in which the image and voice areinputted in the same bidirectional communication apparatus at the samepoint due to the utilization purpose. In addition, the destination towhich the image and voice are communicated is generally specified onpurpose by the user of the terminal.

[0009] In addition, in the image streaming technique, one image withsound is delivered to numerous receiving apparatuses, and combining ofarbitrary image with arbitrary sound is not normally performed. Inaddition, the previously disclosed apparatus selecting and combining theimage and sound cannot combine an image with arbitrary sound on thenetwork.

[0010] In addition, the image delivery system continuously deliveringthe image via a data transmission medium such as internet and intranethas already been popularized in the society, and is used in a variety offields such as transmission of live images, indoor and outdoormonitoring and observation of animals and plants.

[0011] These image delivery systems use image delivery servers fordelivering images, and many of the image delivery servers employ theJPEG coding mode (international standard image coding mode defined byISO/IEC 10918) as an image coding mode.

[0012] On the other hand, coded image data conforming to the JPEG codingmode (JPEG coded data) sent from the image delivery server is receivedby a client terminal, and is decoded and then displayed on the screen.Since many of currently popularized PCs (personal computers) and PDAs(personal data assistants) have a function for decoding JPEG coded dataas a standard function, the PC and PDA are used as client terminals.

[0013] In recent years, the cellular phone has sprung into wide use, andfor the portable terminal used in Japan, the cellular phone surpassesthe notebook PC and PDA in penetration rate. In addition, the functionof the cellular phone has been rapidly improved, and the cellular phonecompatible with the third generation communication mode recentlycommercialized in Japan is provided as a standard function with afunction for decoding coded data (MPEG4 coded data) conforming to theMPEG4 coding mode (international standard voice and image coding modedefined by ISO/IEC 14496). However, the cellular phone is not normallyprovided with a function for decoding JPEG coded data, and it istherefore impossible to directly send JPEG coded data from the imagedelivery server to the cellular phone.

[0014] For solving this problem, two methods are presented. The firstmethod is a method in which the image delivery server is modified sothat MPEG4 coded data can be sent. In this method, however, the existingimage delivery server should be replaced with a new image deliveryserver, and thus the cost for the replacement is considerably increasedin proportion to the number of image delivery servers to be installed.

[0015] The second method is a method in which a relay server isinstalled at some midpoint in the communication path between the imagedelivery server and the cellular phone, and JPEG coded data is convertedinto MPEG4 coded data by this relay server. The advantage of this methodis that a plurality of image delivery servers are connected to one relayserver, whereby the number of relay servers to be installed cansignificantly be reduced, and thus the cost for installation issignificantly reduced.

[0016] However, the method in which the relay server is installed has adisadvantage. That is, since the image size normally decodable by thecellular phone is the QCIF (Quarter CIF) size (lateral: 176 pixels;longitudinal: 144 pixels) while the image size normally dealt with bythe conventional image delivery server is the QVGA (Quarter VGA) size(lateral: 320 pixels; longitudinal: 240 pixels) or {fraction (1/16)} VGAsize (lateral: 160 pixels; longitudinal: 120 pixels), JPEG coded data ofthe QVGA size or {fraction (1/16)} VGA size must be converted into MPEG4coded data of the QCIF size, and the image quality may be degraded dueto this conversion of coded data.

[0017] For example, the conventional method of converting the resolutionof JPEG coded data is such that as disclosed in Japanese PatentLaid-Open No. 4-229382, the image size is reduced by a factor oflaterally m/8 and longitudinally n/8 (m and n are each an integer numberequal to or greater than 1 and equal to or smaller than 7) by taking outonly lower coefficient components from orthogonal conversion data in oneblock obtained during processing of JPEG image decoding and subjectingthem to inverse orthogonal conversion. However, conversion from the QVGAsize to the QCIF size results in laterally 0.55 times (4.4/8 times) andlongitudinally 0.6 times (4.8/8 times), and conversion from the{fraction (1/16)} VGA size to the QCIF size results in laterally 1.1times (8.8/8 times) and longitudinally 1.2 times (9.6/8 times). Thus, mnor n is an integer number, and it is thus impossible to performconversion from the QVGA size or {fraction (1/16)} VGA size to the QCIFsize.

[0018] In addition, conventional general methods of converting the imageresolution include a method in which the image is thinned out by takingpixels in a fixed ratio (scaledown), a method in which same pixels arerepeatedly inserted (scaleup), and a method in which the weightedaverage value of a plurality of neighboring pixels is calculated togenerate a new pixel value. These methods allow the image size to beconverted in any ratio. However, these conventional methods haveproblems described below with reference to FIGS. 44 to 47.

[0019]FIG. 44 shows the correspondence between image areas before andafter the QVGA size image is converted into the QCIF size image by theconventional technique. As shown in this figure, an image area oflaterally 320 pixels and longitudinally 240 pixels is scaled down to animage area of laterally 176 pixels and longitudinally 144 pixels. Itcorresponds to a conversion factor of laterally 0.55 times (4.4/8 times)and longitudinally 0.6 times (4.8/8 times) as described previously.

[0020]FIG. 45 illustrates the shifting of block border lines caused bythe conversion of image size in FIG. 44. In this figure, solid linesshow positions of border lines laterally spaced by 8 pixels andlongitudinally spaced by 8 pixels, and dotted lines show positions ofborder lines laterally spaced by 4.4 (=8×0.55) pixels and longitudinallyspaced by 4.8 (=8×0.6) pixels. That is, positions of block border linesin the image before conversion are shifted from positions shown by solidlines to positions shown by dotted lines due to the conversion of imagesize in FIG. 44. Then, the image after conversion is divided again alongblock border lines in positions shown by solid lines, and is subjectedto MPEG4 image coding, and therefore the image obtained after beingsubjected to MPEG4 image decoding has block border lines in bothpositions shown by dotted lines and solid lines.

[0021] Block border lines in positions shown by dotted lines are createdat the time of coding the image by JPEG coding in the image deliveryserver, and block deformations become more noticeable in positions shownby dotted lines as the compression rate of JPEG coding is increased. Inaddition, block border lines in positions shown by solid lines arecreated at the time of coding the image by MPEG4 image coding in therelay server, and block deformations also becomes more noticeable inpositions shown by solid lines as the compression rate of MPEG4 imagecoding is increased.

[0022] The communication traffic between the image delivery server andthe cellular phone is currently several tens to several hundredskilobits per second, which is insufficient for transmitting a movingimage to move smoothly, and therefore the compression rate of the imageis normally set to a high level. Thus, block deformations appear plainlyin both positions shown by dotted lines and solid lines shown in FIG.45, and consequently the quality of images viewed by the user of thecellular phone is significantly reduced.

[0023]FIG. 46 shows the correspondence between image areas before andafter the {fraction (1/16)} VGA size image is converted into the QCIFsize image by the conventional technique. As shown in this figure, animage area of laterally 160 pixels and longitudinally 120 pixels isscaled up to an image area of laterally 176 pixels and longitudinally144 pixels. It corresponds to a conversion factor of laterally 1.1 times(8.8/8 times) and longitudinally 1.2 times (9.6/8 times) as describedpreviously.

[0024]FIG. 47 illustrates the shifting of block border lines caused bythe conversion of image size in FIG. 46. In this figure, solid linesshow positions of border lines laterally spaced by 8 pixels andlongitudinally spaced by 8 pixels, and dotted lines show positions ofborder lines laterally spaced by 8.8 (=8×1.1) pixels and longitudinallyspaced by 9.6 (=8×1.2) pixels. That is, positions of block border linesexisting in the image before conversion are shifted from positions shownby solid lines to positions shown by dotted lines due to the conversionof image size in FIG. 46. Then, the image after conversion is dividedagain along block border lines in positions shown by solid lines, and issubjected to MPEG4 image coding, and therefore the image obtained afterbeing subjected to MPEG4 image decoding has block border lines in bothpositions shown by dotted lines and solid lines.

[0025] That is, in the case of the {fraction (1/16)} VGA size image,block deformations occur in both positions shown by dotted lines andsolid lines, and consequently the quality of images viewed by the userof the cellular phone is significantly reduced.

SUMMARY OF THE INVENTION

[0026] The present invention has been made in consideration of the abovesituation, and has as its first object to eliminate the necessity toadditionally provide an interface in a camera server apparatus tocommunicate with a portable terminal and the like, thus making itpossible to avoid an increase in the cost of the camera serverapparatus.

[0027] The second object is to eliminate the necessity to additionallyprovide a dedicated interface for controlling the camera serverapparatus, thus avoiding an increase in the cost of the camera serverapparatus.

[0028] The third object is to provide no redundant functions such asinformation superimposition processing in the camera server apparatus,thus avoiding an increase in the cost of the camera server apparatuswhile realizing superimposition of information on an image provided fromthe camera server apparatus.

[0029] According to the present invention, the foregoing first object isattained by providing an image processing apparatus comprising: an imageinformation reception unit adapted to receive image information obtainedby sensing from an image sensing apparatus capable of delivering theimage information to a first image display apparatus; and an imageinformation delivering unit adapted to convert the image informationreceived by the image information reception unit to a format for asecond image display apparatus different in type from the first imagedisplay apparatus, and deliver the converted image information to thesecond display apparatus.

[0030] The fourth object is to make it possible to code image data andsound data existing at different locations on a network as an image withsound and send the same to a receiving apparatus in a system in which acamera located at a remote site is controlled via the network to obtainan image.

[0031] According to the present invention, the foregoing fourth objectis attained by providing an information delivery apparatus comprising:an image data reception unit adapted to receive image data from aplurality of image sending apparatuses capable of sending the imagedata; a sound data reception unit adapted to receive sound data from aplurality of sound sending apparatuses; a coding unit adapted toselectively combine the image data received by the image data receptionunit with the sound data received by the sound data reception unit, andcode the combined data as image data with sound; and a delivering unitadapted to deliver the image data with sound generated by the codingunit to a receiving apparatus.

[0032] The fifth object is to minimize reduction of image quality due toconversion, particularly an apparent increase in block deformation whencoded data sent from an image delivery server is converted to adifferent data format by a relay server.

[0033] According to the present invention, the foregoing fifth object isalso attained by providing a conversion processing method in which imagedata of a first size coded by a first coding method in which data isdivided into first blocks and coded is converted into image data of asecond size coded by a second coding method in which data is divided insecond blocks and coded, comprising: clipping image data equivalent tothe second size from image data of the first size along a borderline ofthe first block, and coding the clipped image data equivalent to thesecond size by the second coding method.

[0034] Other features and advantages of the present invention will beapparent from the following description taken in conjunction with theaccompanying drawings, in which like reference characters designate thesame or similar parts throughout the figures thereof.

BRIEF DESCRIPTION OF THE DRAWINGS

[0035] The accompanying drawings, which are incorporated in andconstitute a part of the specification, illustrate embodiments of theinvention and, together with the description, serve to explain theprinciples of the invention.

[0036]FIG. 1 is a schematic diagram showing a physical configuration ofan information delivery system according to a first embodiment of thepresent invention;

[0037]FIG. 2 is a block diagram showing a configuration of a cameraserver apparatus according to the first embodiment of the presentinvention;

[0038]FIG. 3 shows one example of a user interface screen in a displayoperation terminal;

[0039]FIG. 4 shows one example of the external appearance of a portabledisplay terminal;

[0040]FIG. 5 shows a logical configuration of the information deliverysystem focusing on the flow of data according to the first embodiment ofthe present invention;

[0041]FIG. 6 is a flowchart showing operations of an image convertingunit of a conversion server apparatus according to the first embodimentof the present invention;

[0042]FIG. 7 shows a database configuration of an advertisement serverapparatus according to the first embodiment of the present invention;

[0043]FIG. 8 is a flowchart showing the flow of from acquirement of acamera control right to issuance of a camera control command accordingto the first embodiment of the present invention;

[0044]FIG. 9 shows a data format showing a request for the cameracontrol right and the camera control command according to the firstembodiment of the present invention;

[0045]FIG. 10 shows the flow of from issuance of a control command soundto reception of a response sound according to the first embodiment ofthe present invention;

[0046]FIGS. 11A and 11B show correspondence tables between key buttonsand camera control commands and between responses from the camera serverapparatus and response sounds played back in the portable displayterminal according to the first embodiment of the present invention;

[0047]FIG. 12 is a flowchart showing the flow of response operationsaccording to the control command by the camera server apparatusaccording to the first embodiment of the present invention;

[0048]FIG. 13 shows a data format of the camera control commandaccording to the first embodiment of the present invention;

[0049]FIG. 14 shows a format of sound data exchanged between a deliveryserver apparatus and the conversion server apparatus according to thefirst embodiment of the present invention;

[0050]FIG. 15 shows a data format of sound data exchanged between thedelivery server apparatus and the conversion server apparatus accordingto the first embodiment of the present invention;

[0051]FIG. 16 is a flowchart showing the outline of the flow during therequest for the control right in a control right managing and soundconverting unit of the conversion server apparatus according to thefirst embodiment of the present invention;

[0052]FIG. 17 is a flowchart showing the flow of camera switchingcontrol according to a second embodiment of the present invention;

[0053]FIG. 18 shows a correspondence table of camera numbers-cameranames-camera addresses according to the second embodiment of the presentinvention;

[0054]FIG. 19 is a configuration of an advertisement information tableaccording to a third embodiment of the present invention;

[0055]FIG. 20 is a block diagram showing a configuration of theinformation delivery system according to a fourth embodiment of thepresent invention;

[0056]FIG. 21 is a flowchart showing operations of the conversion serverapparatus according to a fifth embodiment of the present invention;

[0057]FIG. 22 shows an outlined configuration of the informationdelivery system according to a sixth embodiment of the presentinvention;

[0058]FIG. 23 is a block diagram showing hardware configurations of animage server and a sound server according to the sixth embodiment of thepresent invention;

[0059]FIG. 24 is a block diagram showing a software configuration of theinformation delivery system according to the sixth embodiment of thepresent invention;

[0060]FIG. 25 shows an operation procedure of a software module of theinformation delivery system according to the sixth embodiment of thepresent invention;

[0061]FIGS. 26A to 26C show table configurations for the relay server tomanage image information, sound information and image-soundcorrespondences according to the sixth embodiment of the presentinvention;

[0062]FIG. 27 is a flowchart showing a procedure of request processingprocess of the relay server according to the sixth embodiment of thepresent invention;

[0063]FIG. 28 is a flowchart showing a procedure of image receptionprocess of the relay server according to the sixth embodiment of thepresent invention;

[0064]FIG. 29 is a flowchart showing a procedure of sound receptionprocess of the relay server according to the sixth embodiment of thepresent invention;

[0065]FIG. 30 is a flowchart showing a procedure of image-soundsynthesis and transmission process of the relay server according to thesixth embodiment of the present invention;

[0066]FIGS. 31A and 31B show table configurations for the relay serverto manage condition information and image-sound correspondencesaccording to a seventh embodiment of the present invention;

[0067]FIG. 32 is a flowchart showing a procedure of request processingprocess of the relay server according to the seventh embodiment of thepresent invention;

[0068]FIG. 33 is a block diagram showing an outlined configuration ofthe information delivery system according to an eighth embodiment of thepresent invention;

[0069]FIG. 34 shows a configuration of the information delivery systemaccording to a tenth embodiment of the present invention;

[0070]FIG. 35 is a flowchart showing conversion processing according tothe tenth embodiment of the present invention;

[0071]FIG. 36 illustrates the conversion of the image size according tothe tenth embodiment of the present invention;

[0072]FIG. 37 is a flowchart showing conversion processing according toan eleventh embodiment of the present invention;

[0073]FIG. 38 illustrates the conversion of the image size according tothe eleventh embodiment of the present invention;

[0074]FIG. 39 is a flowchart showing conversion processing according tothe twelfth embodiment of the present invention;

[0075]FIG. 40 is a flowchart showing conversion processing according tothe thirteenth embodiment of the present invention;

[0076]FIG. 41 is a flowchart showing conversion processing according toa fourteenth embodiment of the present invention;

[0077]FIG. 42 shows a growth of block deformation occurring when theimage size is reduced by a factor of 2 in both lateral and longitudinaldirections according to the fourteenth embodiment of the presentinvention;

[0078]FIG. 43 is a flowchart showing conversion processing according toa fifteenth embodiment of the present invention;

[0079]FIG. 44 illustrates the conversion of image size according to theprior art;

[0080]FIG. 45 illustrates the block deformation occurring when the imagesize is converted in the prior art;

[0081]FIG. 46 illustrates the conversion of the image size in the priorart; and

[0082]FIG. 47 illustrates the block deformation occurring when the imagesize is converted in the prior art.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0083] Preferred embodiments of the present invention will be describedin detail in accordance with the accompanying drawings.

[0084] <First Embodiment>

[0085] The first embodiment is aimed at realizing superimposition ofinformation. such as an advertisement on an image captured by a cameraserver apparatus, capable of being remote-controlled, in a conversionserver apparatus during delivery, delivering of the image to a portabledisplay terminal, and camera control from the portable display terminal.

[0086]FIG. 1 is a schematic diagram of a physical configuration of animage delivery system according to the first embodiment. As shown inFIG. 1, the image delivery system is constituted by a camera serverapparatus system consisting of a camera server apparatus 111, a displayoperation terminal 112 and a first network 113, and a conversion serverapparatus 114, an advertisement server apparatus 115, a second network116, a delivery server apparatus 117, a third network 118 and a portabledisplay terminal 119.

[0087] In the camera server apparatus system, the address of the cameraserver apparatus 111 is designated from the operation display terminal112 via the first network 113 to establish connection to obtain areal-time image sensed by the camera server apparatus 111, and a cameracontrol right is obtained to perform camera control as necessary. Aplurality of display operation terminals 112 and camera serverapparatuses 111 may exist as long as they can be identified on thenetwork.

[0088]FIG. 2 is a block diagram showing a configuration of the cameraserver apparatus 111, wherein an image sensed by an image sensing unit121, namely a camera, is captured as digital data by an image capturingand compressing unit 122 to generate a compressed image of Motion JPEGformat, and the image is delivered to the display operation terminalmaking a request for connection by an image communicating unit 125. Ifthe request for connection is made from a plurality of display operationterminals, the image is delivered to a plurality of display operationterminals at a time. In addition, the display operation terminalacquiring the right to control the camera (camera control right) canissue a camera control command to a camera control unit 123 to performcamera control of pan, tilt, zoom and the like. A camera controlcommunicating unit 126 generates and interprets such a camera controlcommand and controls a response. A control right managing unit 124manages information associated with control right, such as an amount ofremaining time over which the display operation terminal currentlyowning the control right can control the image sensing unit 121, thelist of display operation terminals making a request for the controlright, and a priority. A communication control unit 127 controls thecommunication between the image communicating unit 125 and the cameracontrol communicating unit 126, and the outside.

[0089] In the first embodiment, the display operation terminal 112 isconstituted by an apparatus such as a personal computer capable ofperforming graphical screen operations. When the address of the cameraserver apparatus 111 is designated from the display operation terminal112 to establish connection with the camera server apparatus 111, a userinterface screen as shown in FIG. 3 is displayed. In FIG. 3, referencenumeral 131 denotes an image display window, in which an image obtainedfrom the camera server apparatus 111 is displayed under extension. Forcontrolling a camera, first a button 135 is pressed down, and after thecontrol right is acquired, a scroll bar 132 for camera pan, a scroll bar133 for camera tilt and a scroll bar 134 for camera zoom are operated tocontrol camera. Note that the control right for camera operation can beacquired by only one client at a time for each camera server apparatus111. The camera control command is issued in response to this operation,and camera control commands that are used here and responses from thecamera server apparatus are shown in FIG. 9. Details on commands aredescribed later.

[0090] The conversion server apparatus 114 converts a compressed imageof Motion JPEG format obtained from the camera server apparatus 111 to acompressed image format (MPEG4 in the first embodiment) capable of beingdisplayed by the portable display terminal 119, and information obtainedfrom the advertisement server apparatus 115 is superimposed on the imageas necessary, and the image is delivered to the delivery serverapparatus 117 through the second network 116.

[0091] In addition, the delivery server apparatus 117 delivers the imageto a plurality of portable display terminals 119 with which connectionis established. Each portable display terminal 119 decodes and displaysthe received MPEG4 image. Note, it is assumed that the portable displayterminal 119, such as cellular phone and Personal Digital Assistants,can receive at a high speed a digital image of, for example, 64 kbps. Anexample of the portable display terminal 119 is shown in FIG. 4. Thereference numeral 141 denotes an image and information displaying unit,and reference numeral 142 denotes a key button unit.

[0092] In the first embodiment, the conversion server apparatus 114converts a Motion JPEG image into an MPEG4 image. Thus, the image formatused in the system is assumed to have a simple profile such that thesystem extending from the camera server apparatus 111 to the conversionserver apparatus 114 or the display operation terminal 112 is associatedwith Motion JPEG, and the system extending from the conversion serverapparatus 114 to the portable terminal 119 is associated with a visualportion of MPEG4.

[0093] In the first embodiment, however, the image compression format isnot particularly limited, and any system is acceptable as long as thecompressed image format received from the camera server apparatus can beconverted to a compressed image format capable of being displayed by theportable terminal 119 in the conversion server apparatus 114. Inaddition, format conversion is not necessarily performed if informationcan be superimposed as long as the image is correctly delivered anddisplayed. In addition, a non-compressed image is acceptable if thecompression of the image is considered unnecessary from a viewpoint ofprocessing and loads on the network.

[0094] Furthermore, as for the camera control, a request for controlright is made from the portable terminal 119 to the camera serverapparatus 111 to acquire the control right, and after the control rightis acquired a control command is issued and delivered to the cameraserver apparatus 111 via the delivery server apparatus 117 and theconversion server apparatus 114. The first embodiment is describedassuming that a bidirectional sound channel for speech communication isused to deliver a control signal and its response in the systemextending from the portable terminal 119 to the delivery serverapparatus 117 to the conversion server apparatus 114. This will bedescribed in detail later.

[0095] Furthermore, the conversion server apparatus 114 looks identicalto the display operation terminal 119 except for some part when seenfrom the camera server apparatus 111. Furthermore, each apparatus exceptfor the portable display terminal 119, described in the firstembodiment, is assigned an IP address (hereinafter referred to asaddress) as an identification address allowing each apparatus to beidentified on the network. However, the portable display terminal 119can be identified on the network using a cellular phone specific scheme,namely a telephone number. However, any identification scheme isacceptable as long as each apparatus and terminal can be identified forcarrying out communications.

[0096] The first network 113 may be any digital network such as internetand intranet having a band sufficient for passing camera controlcommands and images between a plurality of operation display terminals112, conversion server apparatuses 114 and camera server apparatuses 111existing on the network. It should be noted that, in the firstembodiment, the image passed through the first network 113 is packetizedMotion JPEG, and the camera control command and the response thereto arealso packetized for each command.

[0097] The second network 116 may be any digital network such asinternet and intranet having a band sufficient for passing images,camera commands and responses between the conversion server apparatus114 and the delivery server apparatus 117. In the first embodiment, theimage passed through the second network 116 is a packetized MPEG4 image,and the camera control command and the response are bidirectionaldigital sound data with sound data digitized and packetized as describedlater.

[0098] In the first embodiment, the third network 118 is a cellularphone network that is wireless on the side of the portable displayterminal 119 having a band sufficient for passing images and cameracontrol commands from/to the delivery server apparatus 117 to/from theportable display terminal 119. Logically, any physical configuration isacceptable as the third network 118 as long as a band necessary andsufficient for carrying out communications between the delivery serverapparatus 117 and the portable terminal 119 can be secured. In the firstembodiment, the image passed through the third network 118 is apacketized MPEG4 image, and the camera control command and the responseare passed as sounds of the bidirectional sound channel for speechcommunication on the second and third networks 116 and 118 as describedlater. In addition, the conversion server apparatus 114 may be connectedto the advertisement server apparatus 115 via any network having a bandsufficient for passing advertisement information.

[0099] A logical configuration focusing on data flow is shown in FIG. 5.In FIG. 5, elements same as those of FIG. 1 are given the same referencenumbers. The display operation terminal 112 is a client for the cameraserver apparatus 111. The camera server apparatus 111 is constituted bya camera operating unit 161 and a display unit 162. In the operationscreen of FIG. 3, the camera operating unit 161 corresponds to elements133 to 135, and the display unit 162 corresponds to the image displaywindow 131. Data is exchanged with the camera control communicating unit127 and the image communicating unit 125 of the camera server apparatus111 to display an image on the display unit 162, and camera control isperformed using the camera operation unit 161. Note that a plurality ofdisplay operation terminals 112 may be connected to one camera serverapparatus 111 at a time as described previously.

[0100] The conversion server apparatus 114 is constituted by an imageconverting unit 164 carrying out stream conversion to convert thecompression format of the image and superimposition of advertisementinformation obtained from an advertisement server apparatus 115 on theimage, and a control right managing and sound converting unit 163managing the control right as described later and converting specificdigital sound data into a camera control command. When the conversionserver apparatus 114 is activated, an address is assigned to each of thedelivery server apparatus 117, the advertisement server apparatus 115and the camera server apparatus 111 to establish connection. Theaddresses of destinations with which those apparatuses establishconnection are stored in a connection managing unit 165, and theseaddresses are used to establish connection. There may be a plurality ofdelivery server apparatuses 117, advertisement server apparatuses 115and camera server apparatuses 111 on the network with respect to theapparatus with which connection is established. In this case, oneapparatus may be selected from each type of apparatuses.

[0101] The delivery server apparatus 117 is constituted by a sounddelivering unit 166 and an image delivering unit 167, and is connectedto the conversion server apparatus 114 on a one-to-one basis, but may beconnected to a plurality of portable display terminals 119 at a time toexchange data therewith. The sound delivering unit 166 passes digitalsound data from the portable display terminal 119 to the control rightmanaging and sound converting unit 163 of the conversion serverapparatus 114, and delivers a response (digital sound data) of thecamera server apparatus 111 sent back from the control right managingand sound converting unit 163 to the requiring portable display terminal119. The image delivering unit 167 delivers an MPEG4 image streamdelivered from the conversion server apparatus 114 to a plurality ofportable display terminals 119 connected to the delivery serverapparatus 117 at a time.

[0102] The portable display terminal 119 is constituted by an operationcontrol unit 171 and a display control unit 172, wherein the operationcontrol unit 171 sends to the delivery server apparatus 117 a tonesignal (digital sound data when outputted) generated as a sound of apressed key by pressing down the key button unit 142 of FIG. 4, and thedisplay control unit 172 performs control to display on the display unit141 images and characters of the MPEG4 stream and the like sent from thedelivery server apparatus 117.

[0103] First, the operation in FIG. 5 will be described more in detailfocusing on the flow of images. The camera server apparatus 111 deliversa Motion JPEG-compressed image captured by the image sensing unit 121 toall clients connected to the camera server apparatus 111, namely thedisplay operation terminal 112 and the conversion server apparatus 114.Note that FIG. 5 shows one display operation terminal 112 and oneconversion server apparatus 114, but a plurality of apparatuses mayexist for each type of apparatus as a matter of course.

[0104] The flow in the image converting unit 164 of the conversionserver apparatus 114 is shown in FIG. 6. First, an image is obtainedfrom the camera server 111 at step S111, and the image converting unit164 extends the received Motion JPEG compressed image for each frame(step S112), subjects the image advertisement superimposition processing(step S114) if superimposition of an advertisement is necessary (if YESin step S113), MPEG4-compresses the image again (step S115) and sendsthe image to the delivery server apparatus 117 (step S116). At thistime, in the advertisement superimposition processing at step S114, PTZvalues (pan angle, tilt angle and zoom scale factor) retained in thecontrol right managing and sound converting unit 163 of the conversionserver apparatus 114 are passed to the advertisement server apparatus115, required advertisement information corresponding to the PTZ valuesis obtained from an advertisement database 170, and the advertisement issuperimposed on the image using this advertisement information. Thedelivery server apparatus 117 delivers the received MPEG4 image streamto a plurality of portable display terminals (denoted by referencenumeral 119 in FIG. 5) connected to the delivery server apparatus 117 ata time. Note that there may be cases where the advertisement informationobtained from FIG. 7 described later has no contents depending on timeperiods and/or PTZ values, and in these cases, no advertisement issuperimposed. This means no superimposition of advertisement in stepS113.

[0105] Advertisement information is a combination of an advertisementfile and a position of superimposition. A database is provided in theadvertisement server apparatus 115, whereby the advertisement file andthe position of superimposition can be obtained if an inquiry is madewith the current time and PTZ values of a camera being presented. In thedatabase, a table having a format shown in FIG. 7 is searched in anascending order with the smallest entry number the first, and theadvertisement file and the superimposition position information of thefirst entry appropriate to the corresponding time and PTZ range areacquired. * in FIG. 7 means no range specified (always appropriate).Advertisement files obtained include telop character stirrings, stillimages and image sequence clips.

[0106] The still image and image sequence clip have α plane information,and information can be superimposed in such a manner that the image as abackground is partially seen as required. In the case of a moving image,information is superimposed in synchronization for each frame.Furthermore, advertisement information is superimposed in the firstembodiment, but such information is not limited to advertisementinformation, and any information that needs to be added to the image inmidstream may be superimposed. For example, control state information ofthe camera server apparatus obtained from the camera server apparatuswith which connection is established, such as the number of clientswaiting for the control right, the amount of time to be spent foracquiring the control right and pan/tilt/zoom values may besuperimposed. In the advertisement superimposition processing at stepS114, the position of superimposition (upper, lower, left, right,center, etc.) and the display size of advertisement information(large/medium/small) are determined based on the superimpositionposition information.

[0107] Furthermore, the delivery server apparatus 117 has a telephonenumber as a network interface on the side of the portable displayterminal 119, and when the portable display terminal 119 makes a call tothe telephone number, connection is established and an image is passedfrom the delivery server apparatus 117, and the image is then displayedunder extension in the display control unit 172. Connection can beestablished at a time with the same telephone number even if connectionis established from a plurality of portable display terminals 119. Thedelivery server apparatus 117 has a capacity for passing images andsound bidirectionally described later to a plurality of portable displayterminals 119. Furthermore, here, designation by the IP address ininternet connection such as the i-mode(R) service, a service from NTTDoCoMo Co., Ltd. and URL (Uniform Resource Locator) designation in theconnection destination designation method of WWW may be used instead ofconnection by the telephone number.

[0108] The operation in FIG. 5 will now be described in detail focusingon the flow of control. The flow of operations starting with acquirementof the camera control right and ending with issuance of the cameracontrol command is shown in FIG. 8. When camera control is to beperformed, the portable display terminal 119, the conversion serverapparatus 114 and the display operation terminal 112 each issues arequest for acquirement of the control right to the camera serverapparatus 111 (step S121), and repeatedly sends control commands foroperation of the camera server apparatus 111 and receives the responses(steps S125 and S126) until the control right is lost (NO in step S124)after the control right is acquired (YES in step S122). In this way, theflow of part associated with camera control is basically the same.However, the flow in a plurality of portable display terminals 119connected is different in that the bidirectional sound data channel canbe used.

[0109] The request for the camera control right and camera controlcommands are shown in FIG. 9. The request for the control right,commands such as a pan angle, a tilt angle and a zoom change andresponses thereto are shown in FIG. 9. In the case of the portabledisplay terminal 119, however, these commands are not directly issued,but sound data corresponding to the control command is issued using avarious kinds of key buttons of the key button unit 142 of FIG. 4, andis converted into the camera control command shown in FIG. 9 by theconversion server apparatus 114 to perform camera control. The sameprocess is carried out for the request for the control right.

[0110] The flow of the control command is shown in FIG. 10. When variouskinds of key buttons corresponding to control operations in the keybutton unit 142 of the portable display terminal 119 are pressed, a tonesignal sound (control command sound) is generated. In the operationcontrol unit 171, the sound is encoded with GSM AMR or the like andconverted into digital sound data, and is passed to the sound deliveringunit 166 of the delivery server apparatus 117. The sound delivering unit166 simply passes the sound data to the control right managing and soundconverting unit 163 of the conversion server apparatus 114.

[0111] In the control managing and sound converting unit 163, this sounddata is obtained, and is converted into a corresponding camera controlcommand and issued to the camera server apparatus 111, therebyperforming camera control. The flow of the response to camera control isopposite to that of the control command. A table of correspondencebetween key buttons and camera control commands is shown in FIG. 11A.The key button combination operation generates a control command sound.For pan/tilt/zoom values, digit keys are entered. FIG. 12 shows the flowof response operation corresponding to the control command by the cameraserver apparatus 111. When the camera server apparatus 111 determinesthat an object apparatus has the control right (step S131), it acceptsthe control command (step S132) and sends response information (stepS133). On the other hand, if the camera server apparatus 111 determinesthe received control command is from an apparatus having no controlright, sends a response indicating that control cannot be accepted (stepS134).

[0112] A table of correspondence between responses from the cameraserver apparatus 111 and response sounds played back in the portabledisplay terminal 19 is shown in FIG. 11B. When the response arrives, asound reading aloud words in the correspondence table is generated.Furthermore, in FIG. 11B, reference symbols θ, φ and z denote numbersshowing pan tilt angles and a zoom scale factor, respectively.

[0113] An example of camera control is shown. When three key buttons of4, 2 and 0 are pressed in a row with the control right already acquired,A camera control command shown in FIG. 13 to pan in the left directionby 20 units is generated by the conversion server apparatus 114 andpassed to the camera server apparatus 111. Note that −20 means panningin the left direction by 20 units from the current position. +20indicates a command to pan in the right direction by 20 units.

[0114] Furthermore, in the camera control command and the response ofFIG. 9, first and second items, namely the source address and thedestination address refer to the addresses of source and destinationapparatuses, respectively, for sending commands and responses, and thethird item refers to identification character strings for types ofcommands and responses. The change of pan and tilt angles and the changeof the zoom scale factor each specify a change of angle with relativevalues. The + symbols of values refer to right pan, up-tilt and zoomscaleup, respectively, and the − symbols of values refer to left pan,down-tilt and zoom scaledown, respectively. The value itself is definedwith a minimum control amount as one unit. For the camera controlresponse, pan and tilt angles and the zoom factor obtained as a resultof control are sent back in the form of values.

[0115] Responses are each sent back only to the portable displayterminal issuing a control command sound, but only the camera controlresponse is sent back to all the portable display terminals connected toprovide by sound a notification of conditions of pan/tilt/zoom of thecamera. The data format of sound exchanged between the delivery serverapparatus 117 and the conversion server apparatus 114 is shown in FIG.14.

[0116] Data may be divided into small packets at the time when it issent, but basically data in format described above is bidirectionallyexchanged. When sound data is passed from the delivery server apparatus117 to the conversion server apparatus 114, the data is constituted bydigital sound data corresponding to the control command sound and theidentifier (telephone number) of the portable display terminal 119issuing the control command sound, while when sound data is passed fromthe conversion server apparatus 114 to the delivery server apparatus117, the data is constituted by digitized reading sound and theidentifier (telephone number) of the portable display terminal to whichsound is sent. Furthermore, if sound data is sent back to all theportable display terminals 119 connected as in the case of sending thecamera control response, it should be indicated uniquely that the sounddata is not intended to be passed to a specific portable displayterminal by, for example, assigning number 0 to all portable displayterminals as an identifier (telephone number) of the portable displayterminal as shown in FIG. 15.

[0117] The outlined flow of the request for the control right in thecontrol right managing and sound converting unit 163 of the conversionserver apparatus 114 is shown in FIG. 16. The conversion serverapparatus 114 retains the queue of the identifier (telephone number) ofthe portable display terminal 119, and when a request for the controlright is newly made by sound from the portable display terminal 119, theconversion server apparatus 114 converts it into the correspondingcamera control command and sends the command to the camera server 111(step S171). If the control right can immediately be acquired (YES instep S172), then processing proceeds to step S178 described later. Onthe other hand, if the control right cannot be acquired immediately (NOin step S172), then the identification telephone number is registered inthe end of the queue (step S173). When a notice indicating that theclient can or cannot be added in the queue for the control right isprovided from the camera server apparatus 111 at step S174, a soundcorresponding to such a notice is sent to the portable display terminal119 at step S175. Thereafter, when the control right assignment responseis sent back from the camera server apparatus 111 (YES in step S176),the identification telephone number is taken out from the head of thequeue (step S177), and a corresponding sound is sent back to thecorresponding portable display terminal 119 (step S178). At step S179,control from the telephone number of the portable display terminal 119that can acquire the control right is accepted.

[0118] When a notice indicating that the control right is terminated dueto the expiration of the effective period of the control right or thelike is provided from the camera server apparatus 111 at step S180, asound indicating the termination of the control right is sent to theportable display terminal 119 having the control right at step S181.

[0119] Note that, for the functions achieved by the conversion serverapparatus 114, the delivery server apparatus 117 and the advertisementapparatus 115, the physical configuration of apparatus is notparticularly limited, and for example, those functions may all beperformed on the same apparatus as long as the respective functions canbe performed.

[0120] According to the first embodiment, a different image withadditional information superimposed thereon such as advertisementinformation can be delivered only to a specific terminal by using theconversion server apparatus 114 in the line through which the image isdelivered, and camera server apparatus 111 does not have to have acontent for the portable display terminal 119. In addition, inconjunction with the advertisement server apparatus 115, this additionalinformation can be switched to different information according to timeand camera control values (PTZ values) and the switched information issuperimposed. In addition, not only a still image but also a movingimage and a text can be used as the additional information.

[0121] In addition, in the first embodiment, for the control apparatussuch as the camera server apparatus 111 that cannot accept controldirectly from key buttons of the cellular phone and the like,acquirement of camera control right and camera control operations can beperformed with key buttons by converting sound data of key buttons intoa control command in the conversion server apparatus 114. Furthermore,result of control can be known with sound because the response from thecamera server apparatus 111 is converted into a sound. The states ofpan/tilt/zoom of the camera can also be known with sound.

[0122] <Second Embodiment>

[0123] In the first embodiment, the camera with which connection isestablished is determined at the time of startup in the delivery serverapparatus 117 and the conversion server apparatus 114. In the secondembodiment, the conversion server apparatus 114 switches from theoutside the camera server apparatus 111 with which connection isestablished. Here, the method of performing the switching of the cameraserver apparatus 111 from the portable display terminal 119 will bedescribed.

[0124] The method is basically same as that of the first embodiment, butthe operation of the conversion server apparatus 114 is slightlydifferent, and only aspects different from those of the first embodimentwill be described. The flow of camera switching control seen from theportable display terminal 119 is shown in FIG. 17. In the portabledisplay terminal 119 connected to the delivery server apparatus 117, acamera switching command is issued. The camera switching command isspecified in combination of key buttons of the key button unit 142 shownin FIG. 4. In this case, “#” is pressed (step S191). Thereupon, adigital sound travels to the control right managing and sound convertingunit 163 of the conversion server apparatus 114 as in the case of therequest for the control right and the camera control command in thefirst embodiment.

[0125] If this is considered as a command to change camera serverapparatuses, the control right managing and sound switching unit 163provides a sound response and makes an inquiry about a password (stepS192). Then, the portable display terminal 119 enters the password, andsends back a sound response (step S194) so that the number of the camerato be changed to is entered if the password is correct (step S193). Thecontrol right managing and sound converting unit 163 has a cameranumber-camera name (with sound data)-camera address correspondence tableshown in FIG. 18, and obtains a response sound and the address of thecamera server apparatus 111 using this information.

[0126] When the camera number is entered (step S195), the control rightmanaging and sound converting unit 163 converts it into the address ofthe corresponding camera server apparatus 111 (e.g. 100.20.30.102) usingthe table of FIG. 18, and terminates the connection previouslyestablished with the camera server apparatus 111, and newly establishesconnection with the camera server apparatus 111 having the address of100.20.30.102. In this way, the switching of the camera server apparatus111 can be performed from the portable display terminal 119. Finally,the portable display terminal 119 is notified of the camera that ischanged to at step S196.

[0127] Note that the changing of camera server apparatus can also beperformed in such a manner that the conversion server apparatus 113 isprovided with an additional connection port, and then the address of theconnected camera server apparatus is changed to establish connectionwith a different camera server apparatus 111.

[0128] <Third Embodiment>

[0129] In the third embodiment, advertisements are changed according tothe connected camera server apparatus when the changing of the cameraserver apparatuses is performed as in the second embodiment.

[0130] The process is basically same as that of the first embodimentexcept that the camera server apparatus 111 connected to the conversionserver apparatus 114 can be changed from the outside, and that theadvertisement information table of the advertisement server apparatus115 is different. The changing of the camera can be performed by themethod described in the second embodiment. At this time, if anadvertisement information table shown in FIG. 19 is held instead of thatshown in FIG. 7 as a database of the advertisement server apparatus 115,the address of the connected camera server apparatus is passed inaddition to the current time and PTZ values of the camera as data passedfrom the conversion server apparatus 114 to the advertisement database170, and advertisement information, namely the position ofsuperimposition and advertisement file data of the entry first matchingwith the address of the camera server apparatus in the table of FIG. 19is brought to the conversion server apparatus 114. In this way, theswitching of advertisement information to be displayed can be switchedaccording to the camera server apparatus 111 to be connected.

[0131] <Fourth Embodiment>

[0132] The fourth embodiment is such that the delivery server apparatus117, the conversion server apparatus 114 and the advertisement serverapparatus 115 can be connected selectively for a plurality of paths whena connection path from the portable information terminal 119 to thecamera server apparatus 111 is envisioned in the configuration of thefirst embodiment.

[0133] The configuration of the fourth embodiment is shown in FIG. 20.Apparatuses and terminals are each located on a plurality of networks,and can be uniquely identified as in the case of the first embodiment.The operations of the apparatuses are same as those of the firstembodiment, and therefore only the difference as a system will bedescribed here with same reference numbers being given to apparatuseshaving same configurations as those of FIG. 1.

[0134] The portable display terminal 119 makes a call to the telephonenumber of the delivery server apparatus to establish connection, andprovides image display and performs camera control, and a differentconnection telephone number is assigned for each of a plurality ofdelivery server apparatuses 117. Therefore, if connection is establishedwith a different delivery server apparatus 117, connection will beestablished with a different conversion server apparatus 114 and adifferent advertisement server apparatus 115. For example, in FIG. 20, aconversion server apparatus 114 a and an advertisement server apparatus115 a will be used if connection is established with a delivery serverapparatus 117 a, and a conversion server apparatus 114 b and anadvertisement server apparatus 115 b will be used if connection isestablished with a delivery server apparatus 117 b. If the conversionserver apparatuses 114 a and 114 b are connected to the same cameraserver apparatus 111, a viewed image is the same and camera control isperformed in the same way.

[0135] However, if the contents of the advertisement information tableof FIG. 7 held by the advertisement server apparatus 115 are different,different information may be superimposed even if connection isestablished with the same camera.

[0136] In this way, when the number of advertisements wanted to becontained is too large for the camera server apparatus, for example,this configuration is adopted, thereby making it possible to perform theswitching of advertisement contents even for the image of the samecamera server apparatus.

[0137] <Fifth Embodiment>

[0138] The fifth embodiment is such that information is not superimposedon the image but is displayed by making the switch from the image to theinformation in the first embodiment. Aspects different from those of thefirst embodiment will be described.

[0139] In the conversion server apparatus 114, the camera image maytemporarily be interrupted to switch the image to a picture, an imageand a text retrieved from the advertisement database, control stateinformation obtained from the camera server apparatus or the likeinstead of superimposition of the advertisement at steps S113 and S114in FIG. 6. For timing to make the switch to advertisement information,the switch may be made to advertisement information during the periodover which the camera server apparatus 111 is controlled according tocontrol information because the image is often rolled and thus renderedunclear during camera control. Therefore, a flow shown in FIG. 21 isadditionally provided. That is, information that the state in which PTZcontrol is started and the PTZ operation is not stopped, in other wordsthe information indicative of state in which the camera is currentlyunder PTZ operation is returned from the camera server apparatus 111 tothe conversion server apparatus 114 (step S201). The state in which thecamera is under PTZ operation is included in each frame header of theMotion JPEG image. Then, this state in which the camera is under PTZoperation may be detected in the flow of FIG. 21 (step S202) to put theadvertisement during the period over which the camera is under PTZoperation (step S203).

[0140] Advertisement information is displayed by making the switch tothe advertisement information in the fifth embodiment, but in additionthereto, advertisement information may be inserted into the image todisplay the information in the following ways:

[0141] 1) the switch is made to advertisement information to display theadvertisement information during the period over which camera controlright is awaited;

[0142] 2) the conversion server apparatus is connected to the cameraserver apparatus, and the switch is made to advertisement information todisplay the advertisement information until image data arrives at theconversion server apparatus; and

[0143] 3) the switch is made to advertisement information to display theadvertisement information periodically.

[0144] Furthermore, in the first to fifth embodiments described above,displayed information is not necessarily advertisement information, anyinformation that must not or cannot be held in the camera serverapparatus because it has a large amount of data, it is preferablyinserted in midstream and so on, and that should be superimposed inmidstream is acceptable.

[0145] According to the embodiments described above, the conversionserver apparatus is used in the path though which data is delivered,whereby additional information such as advertisement information can besuperimposed only for the requiring terminal, and an image containingdifferent additional information can be delivered in specific timing,and camera sever apparatus 111 does not have to have a content for theportable display terminal 119. In addition, in conjunction with theadvertisement server apparatus, the different information can be addedin accordance with the time and camera control values (PTZ values) tosuperimpose and display the information. In addition, for the additionalinformation, not only a still image but also a moving image, a text andthe like can be used.

[0146] <Sixth Embodiment>

[0147] The sixth embodiment provides an information delivery system inwhich an image server controlling the camera and sending an image, asound server sending a sound, and a relay server coding data of theimage server and the sound server into an image with sound and sendingthe same to a reception terminal are placed on the network. In theinformation delivery system, when the reception terminal makes a requestfor a specific camera image to the relay server, a desired camera imageand sound data predetermined in the relay server are coded into an imagewith sound and sent back.

[0148] The overall configuration of the information delivery system inthe sixth embodiment is shown in FIG. 22. A relay server 211, an imageserver 212, a sound server 213 and a client 219 are connected to anetwork 218.

[0149] A camera 214 is connected to the image server 212, and the client219 can operate the camera 214 and obtain an image via the network 218.This is achieved by, for example, a method in which when a URL-encodedcommand is sent to the image server 212 by HTTP (HyperText TransferProtocol), then the image server 212 sends back images of a plurality offrames. Note that, for the image data, many coding methods such asMotion JPEG and H.261 and MPEG exist, but the present invention isindependent of the coding method.

[0150] The sound server 213 has connected thereto a microphone 215 and asound archive 216 in which sound data is accumulated, and sends sound onthe network. Sound data in the sound archive 216 can also be stored inan internal storage device of the sound server 213. A command can beprovided to the sound server 213 in the same manner as the case of theimage server 212, and when a request is sent, the sound server 213 sendsback sound data of fixed time length. Here, the coding method of sounddata includes methods such as G.711, G.726, G.729 and GSM-AMR, but thepresent invention is independent of the coding method.

[0151] The client 219 establishes connection with the network 218 bydialup or broadband connection. When the client 219 requests the relayserver 211 to send an image, the relay server 211 makes a request forthe image to the image server 212. On the other hand, the relay server211 makes a request for sound data to the sound server having a soundcorresponding to the image by referring to a correspondence table 217between images and sound possessed in advance. The image server 212 andthe sound server 213 respectively sends back image data and sound datato the relay server 211 based on the request. The relay server 211 codesthe image data and sound data into one image data with sound and sendsback the data to the client 219. The client 219 receives and plays backthe data.

[0152] The client 219 may make a request for camera control to the relayserver 211 in addition to the image, and in this case, the relay server211 directly sends this request to the image server 212 to request theimage server 212 to perform camera control.

[0153] The hardware configuration of the server will now be describedwith reference to FIG. 23. In FIG. 23, the image server 212, the soundserver 213 and the relay server 211 are connected to the network 250.

[0154] The image server 212 comprises a CPU 221, a RAM 222, a ROM 223and a secondary storage device 226. In addition, the image server 212comprises a video RAM (VRAM) 225 and has a monitor 231 connected theretofor providing screen display. In addition, the image server 212comprises a peripheral equipment interface 224 for connecting with aperipheral equipment, and a keyboard 232 for performing operations, apointing device 233 such as a mouse and the camera 214 with or without apan head are connected to the image server 212. In addition, the imageserver 212 comprises a network interface 227 for connection with thenetwork 250. Note that, for the peripheral equipment interface 224,specifications such as PS/2, RS-232C, USB and IEEE1394 may be used, butthis embodiment is not dependent on such specifications.

[0155] The CPU 221, the RAM 222, the ROM 223, the secondary storagedevice 226, the VRAM 225, the peripheral equipment interface 224 and thenetwork interface 227 are connected to an internal bus. Theconfiguration of the image server 212 described above can easily beachieved by using a commercially available personal computer, but theimage server 212 can take a form of a so-called set top box having noneof the VRAM 225, the monitor 231, the keyboard 232 and the mouse 233without any problems because it can be operated from the outside via thenetwork.

[0156] The sound server 213 is almost identical in configuration to theimage server 212, and only the input device to be connected to theserver 213 is different. The sound server 213 is constituted by themicrophone 215 and a speaker 254 for a sound monitor in addition to aCPU 241, a RAM 242, a ROM 243, a secondary storage device 244, a VRAM246, a monitor 251, a peripheral equipment interface 247, a networkinterface 245, a keyboard 252 and a pointing device such as a mouse 253.Also, the sound server 213 can easily be achieved by using acommercially available personal computer. In addition, the sound server213 can take a form of a set top box having none of the VRAM 246, themonitor 251, the keyboard 252, the mouse 253 and the speaker 254 withoutany problems. In addition, if the sound server 213 has the sound archive216 in the internal storage device and no external sound source is used,does not have the microphone 215 connected thereto.

[0157] Finally, the relay server 211 has a configuration same as that ofthe image server 212 except that the camera 214 with a pan head is notprovided, or has a set top box configuration having none of the camera214 with a pan head, a VRAM 225, a monitor 231, a keyboard 232 and amouse 233, and therefore the explanation of the relay server 211 is notpresented here.

[0158] Now, an example of software configuration of the sixth embodimentis shown in FIG. 24. An image server process 261 in the image server212, a sound server process 262 in the sound server 213, a requestprocessing process 265, an image reception process 263, a soundreception process 264, an image and sound transmission process 266 inthe relay server 211, and a client process 267 in the client operate.Here, the process means a program unit operating in a multitaskoperation system.

[0159] The outlined operation of each process will be described withreference to FIG. 25. The client process 267 makes a request for animage list to the request processing process 265 of the relay server 211at the time of startup (S211). The request processing process 265 sendsback the image list (S212). The image list includes information shown inFIG. 26A, and its contents will be described later. The client, whichreceives the list, has the list of images displayed, and the userselects one of the images. Thereupon, the client process 267 makes arequest for connecting to acquire the image to the request processingprocess 265 (S213). Note that, in the case where the user inputsdirectly to the client 219 the destination of connection for acquiringthe image, steps S211 and S212 are not necessary.

[0160] The request processing process 265 of the relay server 211, whichreceives the request for connecting to acquire the image, selects thesound server 213 and a sound by referring to the correspondence table217 between images and sound (S214). Then, the image server 212 and thecamera 214 are designated to start the image reception process 263, andthe sound server 213 and the microphone 215 or the sound file name aredesignated to start the sound reception process 264. In addition, theimage and sound transmission process 266 for coding the received imageand sound data into one image data with sound and sending the data isstarted. The image reception process 263 makes a request for an image tothe image server 212 (S215). In addition, the sound reception process264 makes a request for a sound to the sound server 213 (S216).

[0161] The image server process 261, which receives the request, obtainsthe image from the corresponding camera 214 (S217), and sends back theimage to the image reception process 263 of the relay server 211. Thesound server process 262 similarly obtains corresponding sound data fromthe microphone 215 or the sound archive 216 and sends back the sounddata to the sound reception process 264 (S218). The image and sound datasent back are coded into one image data with sound in the image andsound transmission process 266 (S219), and the data is sent back to theclient process 267 (S220). The client process 267 receives the imagewith sound, and thereafter decodes and plays back the image with sound(S221).

[0162] Information about images and sound retained by the relay serverand information about correspondence between images and sound will nowbe described with reference to FIGS. 26A to 26C. There are three typesof information, namely an image table 271 shown in FIG. 26A, a soundtable 272 shown in FIG. 26B and a correspondence table 273 shown in FIG.26C. The image table 271 is assigned an image number and an image namefor each camera 214 connected to the image server 212 and manages the IPaddress and the port number of the image server 212 and camera names asattributes. The client 219 selects an image name to designate an imagefrom a desired camera. In addition, the sound table 272 is similarlyassigned a sound number and a sound name for each microphone 215 orfile, and manages the IP address and the port number of the sound server213, and microphone names or file names as attributes.

[0163] The correspondence table 273 shows correspondence between imagenumbers and sound numbers, and retains a plurality of sound numberscorresponding to respective image numbers. When the user makes a requestfor an image of which name is designated, the relay server 211determines an image number from the image table 271, then references tothe image number in the correspondence table 273 to acquire a soundnumber corresponding to the image number, and references to the soundnumber in the sound table 272 to pinpoint the location of the sound onthe network. Here, a plurality of sound sources can be registered, andif the user continuously views the image for a long time, soundcorresponding to these sound sources is delivered one after another. Ifa sound cannot be accessed for some reason, the switch is made toanother sound assigned to the same image. In the figure, N/A means thatno data exists.

[0164] The outlined operation of a group of servers in the sixthembodiment has been described above, and operation procedures ofprocesses of the relay server 211 playing a predominant role in thesixth embodiment will be described in detail with reference toflowcharts of FIGS. 27 to 30. The relay server 211 consists of therequest processing process 265, the image reception process 263, thesound reception process 264 and the image and sound transmission process266, and for three processes other than the request processing process265, one process is generated for one client, and each processindependently operates.

[0165]FIG. 27 is a flowchart showing the procedure of the requestprocessing process 265 of the relay server 211. After starting,initialization is carried out at step S231, and an event is awaited atstep S232. When the event occurs, processing of the event is carriedout. Here, an event only from the client process 267 will be described,and events dependent on the OS and so on will not be described.

[0166] If the event is a request for the image (YES in step S223),whether the client has been already connected is determined at stepS234. If the client has been already connected (NO in step S234), animage request event and a sound request event are issued to the imageserver 212 and the sound server 213, respectively, at step S235, andprocessing is returned to S232, where a next event is awaited. If theclient has not been connected (YES in step S234), processing proceeds tostep S236, where whether the number of connections is equal to orsmaller than the maximum number is checked. If the number of connectionsexceeds the maximum number (NO in step S236), a connection rejectionnotice is provided to the client at step S237, and processing isreturned to step S232, where a next event is awaited. The maximum numberof connections is determined in advance in view of the processingcapacity of the relay server 211.

[0167] If the number of connections is smaller than the maximum number(YES in step S236), the IP address of the client is registered asregistry processing for the client 219 at step S238. If personalinformation of the client 219 is sent at the same time, the informationis also registered. Then, a sound corresponding to the image isdetermined, the image reception process 263, the sound reception process264 and the image and sound transmission process 266 are started at stepS239, step S240 and step S241, respectively, and processing is returnedto step S232, where a next event is awaited.

[0168] If the event is not a connection request event in step S233,processing proceeds to step S242, where whether the event is aconnection termination event is determined. This event may be sent bythe client 219, or may be raised as an exception event when the imageand sound cannot be sent to the client in the image and soundtransmission process 266. If it is the connection termination event (YESin step S242), processing proceeds to step S243, where connectiontermination processing is carried out. In the connection terminationprocessing, the image reception process 263, the sound reception process264 and the image and sound transmission process 266 started at the timeof starting connection are terminated. Then, processing proceeds to stepS244, where the client is deleted from the list of connecting clients,and processing returns to step S232, where a next event is awaited.

[0169] If the event is not the connection termination event (NO in stepS242), processing proceeds to step S245, where whether the event is acamera control request event is determined. If it is the camera controlrequest event, processing proceeds to step S246, where a camera controlcommand from the client is transferred to the image server 212, andafter it is completed, processing proceeds to step S232, where a nextevent is awaited.

[0170] If the event is not the camera control request event (NO in stepS245), processing proceeds to step S247, where whether the event is animage list request event is determined. If it is the image list requestevent, an image list is sent back to the client at step S248, andthereafter processing returns to step S232, where a next event isawaited. If the event is not the image list request event (NO in stepS246), processing returns to step S232, where a next event is awaited.

[0171] The operation procedures of the image reception process 263 andthe sound reception process 264 in the relay server 211 will now bedescribed. FIG. 28 shows the operation procedure of the image receptionprocess, and FIG. 29 shows the operation procedure of the soundreception process.

[0172] After processing is started, the image reception process 264waits at step S251 until the image request event is raised from therequest processing process 265. When the image request event occurs (YESin step S251), a request for the image is made to the image server 212by designating the camera name at step S252, and at least an image ofone frame is received at step S253. The number of frames may berequested from the client process 267, or the fixed number of frames maybe preset.

[0173] Then, whether the image could be successfully obtained withoutabnormal conditions at steps S252 and S253 is determined at step S254.The abnormal condition refers to the cases where the image could not bereceived perfectly because the network was disconnected in the course ofreception and so on. If it is determined that some abnormal conditionoccurred (NO in step S254), processing proceeds to step S257, where ifit is determined that the number of tries is equal to or smaller thanthe maximum number, processing returns to step S252, where a try is madeagain to obtain the image. If the number of tries exceeds the maximumnumber, processing proceeds to step S258, where an exception raisingevent is issued to terminate the processing.

[0174] If it is determined that no abnormal conditions occurred at stepS254, processing proceeds to step S255, where the received image isstored in a buffer. Then, whether a termination command is issued ischecked at step S256. This is a command issued at step S243 in FIG. 27.If this command is issued, the processing is terminated. If the commandis not issued, processing returns to step S251 to continue theprocessing.

[0175] The sound reception process 264 waits until the sound requestevent is raised from the request processing process 265 at step S260after processing is started. When the sound request event is raised (YESin step S260), a request for a sound is made to the sound server 213 bydesignating the microphone 215 or the file name at step S261. Then, atstep S262, if the requested sound is a sound file or the like, whetherthe sound ends is checked. This can be known by a response to the soundrequest. If the sound ends, a reference is made to the correspondencetable 217 at step S263, and if there are a plurality of correspondingsound sources, the sound server 213 is requested to select anothersound. Then, a sound of fixed duration is received at step S264. Theperiod of time is set to a period of time corresponding to the number offrames received at a time by the image reception process 263.

[0176] Then, whether the sound could be successfully obtained withoutabnormal conditions at steps S261 and S264 is determined at step S265.The abnormal condition refers to the cases where the sound could not bereceived perfectly because the network was disconnected in the course ofreception and so on. If it is determined that some abnormal conditionoccurred, processing proceeds to step S268, where if it is determinedthat the number of tries is equal to or smaller than the maximum number,processing returns to step S261, where a try is made again to obtain thesound. If the number of tries exceeds the maximum number, processingproceeds to step S269, where an exception raising event is issued toterminate the processing.

[0177] If it is determined that no abnormal conditions occurred (YES instep S265), processing proceeds to step S266, where the received soundis stored in a buffer. Then, whether a termination command is issued ischecked at step S267. This is a command issued at step S243 in FIG. 27.If this command is issued, the processing is terminated. If the commandis not issued, processing returns to step S260 to continue theprocessing.

[0178] The operation procedure of the image and sound transmissionprocess 266 will now be described with reference to FIG. 30. Afterprocessing is started, whether image and sound data exist in an imagebuffer and a sound buffer is determined at step S271. If none of thedata exists, processing proceeds to step S272. If it is determined morethan the maximum number of tries that no image and sound data exit atstep S272, processing proceeds to step S278, where an error is sent tothe client 219, and at step S279, an exception event is raised toterminate the processing. If the number of tries is equal to or smallerthan the maximum number, the step S271 is carried out again afterwaiting time elapses.

[0179] If it is determined that image and sound data exist at step S271,processing proceeds to step S273, where coded data is generated usingeach image and sound as an image with sound. There are a plurality ofcoding methods such as MPEG, RealVideo and Windows(R) Media, but thepresent invention is independent of the coding method. If either one ofthe image or the sound only exists, data can be coded. After data iscoded, coded data is sent to the client 219 at step S274.

[0180] Then, whether an abnormal condition occurred at the time ofsending data is determined at step S275. If it is determined that anabnormal condition occurred, whether a predetermined maximum number oftries for sending data is exceeded is determined at step S277. If it isdetermined that the maximum number is exceeded, processing proceeds tostep S278, where an error is sent, and then the exception event israised to terminate the processing at step S279. If the maximum numberof tries is not exceeded, processing returns to step S274, where data issent again.

[0181] If it is determined at step S275 that no abnormal conditionoccurred at the time of sending data, whether the termination commandhas been issued is determined at step S276. This may be issued in stepS243 in FIG. 27, or may be issued as the exception event in the imagereception process 263 or sound reception process 264. If it isdetermined that the termination command has been issued, the processingis terminated. If it is determined that the termination command has notbeen issued, processing returns to step S273, where data is kept codedand sent.

[0182] As apparent from the above description, according to the sixthembodiment, a web camera system and an information delivery systemcapable of adding an explanation of an image and an advertisement withsound can be built.

[0183] <Seventh Embodiment>

[0184] The seventh embodiment of the present invention will now bedescribed. The seventh embodiment is such that the correspondence table217 held by the relay server 211 is improved in function so that moredetailed correspondence can be dealt with in the sixth embodiment. Moredetailed correspondence means that the image is brought intocorrespondence with the sound more precisely using camera parameterssuch as pan, tilt and zoom, time periods, and personal data such as theage, sex and address of the user. The hardware and softwareconfigurations of the seventh embodiment are same as those of the sixthembodiment, and the correspondence table 217 managed by the relay server211 and the operation of the request processing process 265 of theseventh embodiment are different from those of the sixth embodiment.Therefore, only aspects different from those of the sixth embodimentwill be described below.

[0185] Examples of the correspondence table and condition table held bythe relay server in this embodiment are shown in FIGS. 31A and 31B. FIG.31A shows an example of a condition table 281, and FIG. 31B shows anexample of the correspondence table 282. For the condition table 281,each line is assigned a number as one condition, and conditions of thetime period, camera parameters such as pan, tilt and zoom, and personalinformation of the user such as the age, sex and address are retained asvalues and ranges of values for each condition number.

[0186] In the correspondence table 282, a column in which a condition ofconnection is retained as a condition number is added for each imagenumber, compared with the correspondence table 273 of the sixthembodiment. If None is described in the condition column, it means thatcorrespondence is unconditionally established. If the user designates animage, correspondence between the image and sound is permitted only whenall accompanying conditions are satisfied. If all accompanyingconditions are not satisfied, a sound may not be sent, or a sound to bebrought into correspondence in such a case may be determined in advance.

[0187] Now, the operation procedure of the request processing process265 operating on the relay server 211 in the seventh embodiment is shownin FIG. 32. In FIG. 32, same step numbers are given to operations sameas those in FIG. 27, and only different aspects will be described.

[0188] For processing of the image request event under step S233, ifconditions of camera parameters exist in the condition table 281, acamera condition is obtained from the corresponding image server 212referring to the table 271 of FIG. 26 at step S280. This is theprocessing of obtaining camera parameters of a camera corresponding toan image desired by the client. Then, a set consistent with conditionssuch as camera parameters is retrieved from the condition table 281 byreferring to conditions in the condition table 281, and the soundcorresponding to the condition number is selected from thecorrespondence table 282 of FIG. 31B. Then, the relay server 211 issuesa request for obtaining the sound data to the corresponding sound server213 corresponding to the sound selected in step S239, and receives thesound data.

[0189] If personal information of the user exists in the condition table281, the client 219 needs to send the personal information of the user.In this case, the personal data is sent at the same time when a requestfor an image is sent from the client 219 to the relay server 211. Therelay server 211 retrieves a set consistent with the condition from thecondition table 281 based on the received personal data, and selects thesound corresponding to condition number from the correspondence table282. Then, the relay server 211 issues a request for obtaining the sounddata to the corresponding sound server 213 corresponding to the soundselected in step S239, and receives the sound data.

[0190] Also, if time information exists in the condition table 281, therelay server 211 retrieves from the condition table 281 a set of timeperiods included in the time just when the request for obtaining imagedata was made from the client 219, and selects the sound of acorresponding condition number from the correspondence table 282. Then,the relay server 211 issues a request for obtaining the correspondingsound data to the sound server 213 corresponding to the selected sound,and receives the sound data at step S239.

[0191] If the camera control request is made from the client (YES instep S245), a command for control the camera is issued to the imageserver 212 at step S282. Then, parameter information of the camera isobtained at step S283. Then, whether reconnection is necessary for thesound is determined at step S284. This is performed for checking whetherthe condition when the current connection was permitted is stilleffective after control of the camera by referring to the conditiontable 281 of FIG. 31A again. If the condition is still effective,processing proceeds to step S232, where a next event is awaited. If thecondition is not effective, reconnection is required, and therefore thesound number of a corresponding condition number is found from thecorrespondence table 281 of FIG. 31B after referring to the conditiontable 281 of FIG. 31A again, and reconnection processing is carried outat step S285. This is the processing of designating the sound server 212and the microphone 215 or the file to restart the sound receptionprocess.

[0192] As described above, according to the seventh embodiment,correspondence can be determined in more detail by designatingconditions such as time, camera parameters and personal information ofthe user, and as a result, what is displayed on the screen canaccurately be explained when the image is explained with sound, and aneffective sound can be added to an image in the sound advertisement andthe like.

[0193] <Eighth Embodiment>

[0194] The eighth embodiment of the present invention will now bedescribed. The eighth embodiment is such that a movable body terminalsuch as a cellular phone can be used in addition to the PC client in thesixth or seventh embodiment. The system configuration of the eighthembodiment is shown in FIG. 33.

[0195]FIG. 33 shows a mobile body communication network and a cellularphone client in addition to entities shown in FIG. 22. In FIG. 33, samereference numbers are given to configurations same as those in FIG. 22and descriptions thereof are omitted, and only different aspects will bedescribed. A portable terminal client 292 establishes connection with agateway of a delivery center 290 of a mobile body communication carriervia a mobile body communication network 291. Then, the gateway convertsdata of a communication method in the mobile body communication networkto data of a communication method on a network 218 to exchangeinformation. Communication methods between the portable terminal client292 and the gateway include a line exchange method and a packetcommunication method.

[0196] If a cellular phone is used as a terminal, a telephone number isassigned for each image sensed by each camera 214 in the gateway in thedelivery center 290, and when a call is made from the terminal to atelephone number corresponding to an image, a request for thecorresponding image is made from the gateway in the delivery center 290to the relay server 211. Then, image data with sound from the relayserver 211 is converted into an image stream for mobile bodycommunication in the gateway, whereby the data can be received andplayed back by the terminal.

[0197] For connection by the packet communication method, if a wellknown service for playing back an image sequence clip, the relay server211 creates and sends back a video clip with an image and acorresponding sound combined together when the camera 214 is designatedto the relay server 211, and thus this video clip can be received viathe gateway and played back by the terminal.

[0198] Also, if the line exchange and the packet exchange can beconnected at a time, the camera can be operated on the screen on thecellular phone terminal, thus making it possible to obtain a still imagewhile receiving sound data. In this case, image data with sound sentback from the relay server 211 is divided into still image data forpacket communication and sound data for line exchange in the gateway,and is sent to the terminal.

[0199] As described above, according to the eighth embodiment, a webcamera operation with sound using as a client the portable terminalusing the mobile body communication network can be carried out in thesixth embodiment.

[0200] <Ninth Embodiment>

[0201] The ninth embodiment of the present invention will now bedescribed. The ninth embodiment is such that the correspondence table217 (273 or 282) between images and sounds and the condition table 281held by the relay server 211 can be changed in the sixth and seventhembodiments. This is achieved by sending a request for addition, update,deletion and the like to the relay server 211.

[0202] a) Request for addition and update 1 to correspondence table:http://host-address:port/addctbl?video=id&amp;sound=id [&amp;sound=id .. . ] wherein the id represents an image number for video=id, andrepresents a sound number for sound=id (a plurality of numbers can bedesignated). Reply: HTTP/1.0 200 OKContent-Typetext/plain¥r¥nOKvideo=video_id wherein video_id represents an imagenumber.

[0203] b) Request for deletion from correspondence table:http://host-address:port/delctbl?video=id [&amp;video=id . . . ] whereinthe id of video=id represents an image number (a plurality of numberscan be designated). Reply: HTTP/1.0 200 OKContent-Type text/plain¥r¥nOK.

[0204] For the request for addition and update to the correspondencetable, an image number and a sound number corresponding to the image aredesignated. A plurality of sound numbers can be designated. For therequest for deletion, an image number is designated to deletecorresponding data. A plurality of image numbers to be deleted can bedesignated. If the client makes a connection request for the deletedimage number, only the image is relayed, or correspondence of a definedsound is determined in advance.

[0205] Next, for addition and update to and deletion from the conditiontable 281 shown in FIG. 31A, the following requests and replies can bedefined and executed.

[0206] c) Request for addition and update to condition table:http://host-address:port/addqtbl?qid=num&amp;attr=val1+val2[&amp;attr=val1+val2. . . ] wherein the num of qid=num represents a condition number. Theattr of attr=val1+val2 represents an attribute name, and val1 and val2represent lower and upper limits, respectively. Also, examples of attrinclude pan, tilt, zoom, time, age and sex. Reply: HTTP/1.0 200OKContent-Type text/plain¥r¥nOKqid=qualify_id wherein qualify_idrepresents a condition number.

[0207] d) Request for deletion from condition table:http://host-address:port/delqtbl?qid=num [&amp;qid=num . . . ] whereinthe id of qid=id represents a condition number (a plurality of numberscan be designated). Reply (when successful): HTTP/1.0 200 OKContent-Typetext/plain¥r¥nOK.

[0208] In the case of addition and update to the condition table 281, ifa condition of the designated condition number exists, the condition isupdated, and if the condition does not exist, it is added. If thecondition number is not designated, a new condition number is given andsent back. In the case where deletion from the condition table isdesignated, a condition corresponding to the condition number isdeleted, if any.

[0209] Next, if accompanying conditions are updated in addition andupdate to the correspondence table 282 of FIG. 31B, attributesassociated with the conditions may be added to the above described formof the request for addition and update. That is, the above describedaddition and update 1 to the correspondence table 273 is modified asfollows.

[0210] e) Request for addition and update 2 to correspondence table:http://host-address:port/addctbl?video=id [&amp;qid=id] [&amp;sound=id[&amp;sound=id . . . ]] wherein the id of video=id represents an imagenumber. The id of sound=id represents a sound number (a plurality ofnumbers can be designated). The id of qid=id represents a conditionnumber.

[0211] For the operation of the relay server 211 for achieving theupdate, the procedure of update to the correspondence table and thecondition table is added to the operation procedure of the relay server211 in FIG. 32. That is, if requests such as those of a) to e) describedabove in event processing, alteration processing such as addition andupdate to and deletion from the correspondence table 273 or 282 and/orthe condition table 281 is carried out, and a next event is awaited.

[0212] As described above, according to the ninth embodiment,correspondence and correspondent conditions can be altered dynamicallyby applying alteration processing such as addition and update anddeletion to the correspondence table of images and sounds and thecondition table used by the relay server in the sixth and seventhembodiments.

[0213] As described above, a system capable of receiving image data andsound data from an image sending apparatus and a sound sendingapparatus, respectively, and delivering image data with sound with thereceived image and sound data combined together to a reception apparatuscan be built.

[0214] <Tenth Embodiment>

[0215]FIG. 34 shows an example of an image delivery system using a relayserver as a conversion apparatus for converting the coding method ofimage data in the tenth embodiment.

[0216] In FIG. 34, a camera 301 obtains an image in real time, and in animage delivery server 302, image data is converted to the QVGA or{fraction (1/16)} VGA size, and the image data is coded by the JPEGmethod. A relay server 303 converts image data of the QVGA or {fraction(1/16)} VGA size to image data of the QCIF image size and also convertsJPEG image data to the MPEG image data in a manner described later fordelivering the image to a cellular phone communication network 306. Bythe system described above, the image from the camera can be deliveredto cellular phones 304 a, 304 b, 304 c . . . .

[0217] Furthermore, in the tenth embodiment described below, the codingmethod of the image before conversion of coding is the JPEG codingmethod, and the coding method of the image after conversion of coding isthe MPEG4 image coding method. However, the procedure is effective forcombinations of other coding methods including processing procedures ofblock division, orthogonal conversion and entropy coding, and the imagesbefore and after conversion of coding may be of the same coding method.

[0218]FIG. 35 is a flowchart showing the procedure of a method ofconverting coded data by the relay server 303 in which image data forthe QVGA image display size (lateral 320 pixels and longitudinal 240pixels) is converted to image data for the smaller QCIF image displaysize (lateral 176 pixels and longitudinal 144 pixels) in the tenthembodiment of the present invention.

[0219] Note that, in the embodiment described below, the procedure iseffective for combinations of other image sizes as long as the imagesize after conversion of coding is smaller than the image size beforeconversion of coding.

[0220] At step S311 of FIG. 35, JPEG coded image data of the QVGA sizeis subjected to JPEG entropy-decoding (Huffman-decoding orarithmetic-decoding) to create orthogonal conversion image data of theQVGA size (more specifically, create orthogonal conversion image data ofeach block obtained for each MCU (Minimum Coding Unit) included in theQVGA size image area).

[0221] At step S312, as described later with reference to FIG. 36, thearea of QCIF size image data is clipped from the area of QVGA size imagedata along the partial MCU borderline (more generally, partial blockborderline) to obtain QCIF size orthogonal conversion data (morespecifically, obtain orthogonal conversion data of the block obtainedfor each MCU included in the QCIF size image area).

[0222] Now, the clipping of QCIF size image data from QVGA size imagedata will be described with reference to FIG. 36.

[0223]FIG. 36 shows the correspondence between image areas when the QCIFsize image area is clipped from the QVGA size image area. Provided thatthe coordinate in the upper left-hand corner of the QVGA size image areais (0, 0) and the coordinate in the lower right-hand corner is (319,239), and if the coordinate in the upper left-hand corner of the imagearea to be clipped therefrom is (x1, y1), the coordinate in the lowerright-hand corner is (x1+175, y1+143). Here, x1 and y1 each should be amultiple of the size of a minimum processing unit, MCU (Minimum CodingUnit) in JPEG coding. If the MCU corresponds to the image area oflateral 16 pixels and longitudinal 16 pixels, each of x1 and y1 shouldbe a multiple of 16, and a candidate value of x1 is one of 0, 16, 32,48, 64, 80, 96, 112, 128 and 144, and a candidate value of y1 is one of0, 16, 32, 48, 64, 80 and 96. FIG. 36 shows, as an example, the clippingof an image area in which the coordinate in the upper left-hand corneris (64, 48) and the coordinate in the lower right-hand corner is (239,191).

[0224] Now, referring to FIG. 35 again, the QCIF size orthogonalconversion image data obtained at step S312 is stored in a frame memory,and orthogonal conversion image data of the immediately preceding framestored in the frame data is compared with orthogonal conversion imagedata of the current frame obtained at step S312 for each block (imagearea including lateral 16 pixels and longitudinal 16 pixels) in MPEG4 tocalculate an inter-frame quantitative difference of orthogonalconversion data for each block at step S313.

[0225] The inter-frame quantitative difference calculated at step S313is compared with a predetermined threshold at step S314, and processingproceeds to step S315 if the inter-frame quantitative difference islarger than the threshold, and processing proceeds to S316 if theinter-frame quantitative difference is equal to or smaller than thethreshold.

[0226] That is, either one of processing in step S315 and processing instep S316 is selected for each block depending on the inter-framequantitative difference to carry out processing of image data.

[0227] Orthogonal conversion data obtained at step S312 is subjected toMPEG4 entropy coding (Huffman coding or arithmetic coding defined inMPEG4) in the INTRA mode (mode in which data is coded using image datain the current frame) at step S315. On the other hand, at step S316, itis determined that there is no inter-frame predicted deviation, andMPEG4 entropy coding is carried out in the Inter mode (inter-framepredictive coding mode) based on information of inter-frame predicteddeviation.

[0228] At step S317, MPEG4 coded data in blocks created at step S315 orstep S316 are arranged in order to create incomplete MPEG4 coded data ofthe QCIF size having no header, and an appropriate MPEG4 coded dataheader is created and added to the head of the data, whereby QCIF sizeMPEG4 coded data is created.

[0229] In this way, the coding conversion processing of converting datafrom QVGA size JPEG image data to the QCIF size MPEG image is completed.

[0230] It should be noted that all steps S313 to S316 are not alwaysrequired. If all blocks are processed in the INTRA mode, steps S313,S314 and S316 may be omitted and only processing at step S315 isperformed. However, use of the Inter mode results in a highercompressibility rate of coded data compared to the case where all blocksare processed in the INTRA mode.

[0231] <Eleventh Embodiment>

[0232] Processing in a method of converting coded data of an image bythe image delivery server in the eleventh embodiment will be describedbelow with reference to FIGS. 37 and 38.

[0233]FIG. 37 is a flowchart showing the procedure of the method ofconverting coded data by the server in which image data of the {fraction(1/16)} VGA image display size (lateral 160 pixels and longitudinal 120pixels) is converted to image data of the larger QCIF image display size(lateral 176 pixels and longitudinal 144 pixels) in the eleventhembodiment.

[0234] Note that the procedure in the eleventh embodiment describedbelow is effective for combinations of other image sizes as long as theimage size after conversion of coding is larger than the image sizebefore conversion of coding.

[0235] At step S321 of FIG. 37, JPEG coded image data of the {fraction(1/16)} VGA size is subjected to JPEG entropy-decoding (Huffman-decodingor arithmetic-decoding) to create orthogonal conversion image data ofthe {fraction (1/16)} VGA size (more specifically, create orthogonalconversion data of each block obtained for each MCU (Minimum CodingUnit) included in the {fraction (1/16)} VGA size image area).

[0236] At step S322, as shown in FIG. 38, the entire area of the{fraction (1/16)} VGA size image is inserted along the partial MCUborderline (more generally partial block borderline) of the QCIF sizeimage area, and dummy data (orthogonal conversion data having apredetermined value) is inserted in the remaining part to create QCIFsize orthogonal conversion data (more specifically, create orthogonalconversion data of the block obtained for each MCU included in the QCIFsize image area).

[0237] The method of generating QCIF size image data from QVGA sizeimage data will now be described with reference to FIG. 38.

[0238]FIG. 38 shows the correspondence between image areas when theentire image area of the {fraction (1/16)} VGA size is inserted in theQCIF size image area. Provided that the coordinate in the upperleft-hand corner of the QCIF size image area is (0, 0) and thecoordinate in the lower right-hand corner is (175, 143), and if thecoordinate in the upper left-hand corner of the {fraction (1/16)} VGAsize image area to be inserted therein is (x2, y2), the coordinate inthe lower right-hand corner is (x2+159, y2+119). Here, x2 and y2 eachshould be a multiple of the size of a minimum processing unit, MCU(Minimum Coding Unit) in JPEG coding. If the MCU corresponds to theimage area of 16 pixels in width and 8 lines in height, x2 should be amultiple of 16 and y2 should be a multiple of 8, and in this case, acandidate value of x2 is 0 or 16, and a candidate value of y2 is 0, 8,16 or 24. FIG. 38 shows, as an example, the insertion of the {fraction(1/16)} VGA size image area in the position in which the coordinate inthe upper left-hand corner is (0, 16) and the coordinate in the lowerright-hand corner is (159, 135). Dummy data is inserted in the remainingimage area shown by oblique lines.

[0239] Now, referring to FIG. 35 again, the QCIF size orthogonalconversion image data created at step S322 is stored in a frame memory,and orthogonal conversion image data of the immediately preceding framestored in the frame memory is compared with orthogonal conversion imagedata of the current frame created at step S322 for each block (imagearea including 16 pixels in width and 16 lines in height) in MPEG4 tocalculate an inter-frame quantitative difference of orthogonalconversion data for each block at step S323.

[0240] The inter-frame quantitative difference calculated at step S323is compared with a predetermined threshold at step S324, and processingproceeds to step S325 if the inter-frame quantitative difference islarger than the threshold, and processing proceeds to S326 if theinter-frame quantitative difference is equal to or smaller than thethreshold.

[0241] Processing is carried out for each block both at step S325 andstep S326. Orthogonal conversion data created at step S322 is subjectedto MPEG4 entropy coding (Huffman coding or arithmetic coding defined inMPEG4) in the INTRA mode (mode in which data is coded using image datain the current frame) at step S325. On the other hand, at step S326, itis determined that there is no inter-frame predicted deviation, andMPEG4 entropy coding is carried out in the Inter mode (inter-framepredictive coding mode) based on information of inter-frame predicteddeviation.

[0242] At step S327, MPEG4 coded data in blocks created at step S325 orstep S326 are arranged in order to create incomplete MPEG4 coded data ofthe QCIF size, and an appropriate MPEG4 coded data header is created andadded to the head of the data, whereby QCIF size MPEG4 coded data iscreated.

[0243] In this way, the coding conversion processing of converting{fraction (1/16)} VGA size JPEG image data to the QCIF size MPEG imagedata is completed.

[0244] Furthermore, a server can be built in conjunction with the tenthand eleventh embodiments. For example, whether the image display sizebefore conversion of coding is larger than or smaller than the imagedisplay size after conversion of coding is determined in advance, andaccording to the result of the determination, whether processing of thetenth embodiment is carried out or processing of the eleventh embodimentis carried out is determined.

[0245] <Twelfth Embodiment>

[0246]FIG. 39 is a flowchart showing another procedure of the method ofconverting coded data by the server in which image data of the QVGAimage display size (lateral 320 pixels and longitudinal 240 pixels) isconverted to image data of the smaller QCIF image display size (lateral176 pixels and longitudinal 144 pixels) in the twelfth embodiment.

[0247] Orthogonal conversion data included in the image area clipped isobtained in the tenth embodiment, while JPEG coded data included in theimage area clipped is obtained in the twelfth embodiment.

[0248] Note that the Huffman coding is used as an entropy coding methodin both the JPEG coding and MPEG4 image coding, but an almost sameprocedure can be achieved for combinations of other coding methodsincluding procedures of block division, orthogonal conversion andentropy coding.

[0249] Further, the twelfth embodiment described below, as in the caseof the tenth embodiment, is effective for combinations of other imagesizes as long as the image size after conversion of coding is smallerthan the image size before conversion of coding.

[0250] At step S331 of FIG. 39, the QCIF size image area is clipped fromthe QVGA size image area along any MCU borderline (more generally blockborderline) as shown in FIG. 36, whereby JPEG-coded image data of theQCIF size is obtained (more specifically, JPEG coded image data includedin the QCIF size image area is obtained).

[0251] At step S332, data is converted from the Huffman code for JPEG(Huffman code used in JPEG coding) included in JPEG coded image dataobtained at step S331 to the Huffman code for MPEG4 of the INTRA mode(Huffman code used in INTRA mode coding in the MPEG4 image coding).Here, for achieving the step S332, the contents of the Huffman codetable for JPEG and the contents of the Huffman code table for MPEG4should be recognized in advance.

[0252] Note that the Huffman code table for MPEG4 is recognized inadvance because it is prepared by an apparatus (relay server) to whichthe twelfth embodiment is applied, or software itself. On the otherhand, the Huffman code table for JPEG can be obtained by analyzing theheader portion of JPEG coded data prior to the processing in FIG. 39because the table is defined in the header portion of JPEG coded data.Alternatively, if it is known that the image delivery server sendingJPEG coded data uses the Huffman code table for JPEG having samecontents on every occasion, the Huffman code table for JPEG is simplystored.

[0253] If strings of Huffman codes for MPEG4 obtained at step S232 arebrought together into one frame, incomplete (headerless) MPEG4 codeddata of the QCIF size is obtained. At step S333, strings of Huffmancodes for MPEG4 obtained at step S332 are stored in the frame memory oneafter another, and strings of Huffman codes for MPEG4 of the immediatelypreceding frame stored in the frame memory are compared with strings ofHuffman codes for MPEG4 of the current frame obtained at step S332 foreach block (area of lateral 16 pixels and longitudinal 16 pixels) inMPEG4 to examine existence/nonexistence of an inter-frame difference ofstrings of Huffman codes for MPEG4 for each block.

[0254] At step S334, according to existence/nonexistence of theinter-frame difference examined at step S333, processing immediatelyproceeds to step S336 (no additional processing is carried out) if theinter-frame difference exists, and processing proceeds to step S335(additional processing is carried out) if no inter-frame differenceexists. At step S335, all strings of Huffman codes for MPEG4 included inthe currently processed macro block are replaced with strings of Huffmancodes for MPEG4 indicating predicted deviation 0 of the Inter mode.

[0255] At step S336, stings of Huffman codes for MPEG4 obtained at stepS332 or step S335 are brought together into one frame to create QCIFsize MPEG 4 coded data, and an appropriate MPEG4 coded data header iscreated and added to the head of the data, whereby QCIF size MPEG4 codeddata is created.

[0256] In this way, the coding conversion processing of converting theQVGA size JPEG image data to the QCIF size MPEG image data is completed.

[0257] It should be noted that steps S333 to S335 are not alwaysrequired. If all macro blocks are processed in the INTRA mode, all ofsteps S333 to S335 may be omitted. However, use of the Inter moderesults in a higher compressibility rate of coded data compared to thecase where all macro blocks are processed in the INTRA mode.

[0258] <Thirteenth Embodiment>

[0259]FIG. 40 is a flowchart showing the procedure of the method ofconverting coded data by the server in which image data of the {fraction(1/16)} VGA image display size (lateral 160 pixels and longitudinal 120pixels) is converted to image data of the larger QCIF image display size(lateral 176 pixels and longitudinal 144 pixels) in the thirteenthembodiment.

[0260] Orthogonal conversion data is inserted in the QCIF size imagearea in the eleventh embodiment, while strings of Huffman codes forMPEG4 are inserted in the QCIF size image area in the thirteenthembodiment.

[0261] The Huffman coding is used as the entropy coding method in boththe JPEG coding and MPEG4 coding, but an almost same procedure can beachieved for combinations of other coding methods including proceduresof block division, orthogonal conversion and entropy coding.

[0262] Further, in the thirteenth embodiment described below, as in thecase of the eleventh embodiment, the procedure is effective forcombinations of other image display sizes as long as the image sizeafter conversion of coding is larger than the image display size beforeconversion of coding.

[0263] At step S341 of FIG. 40, data is converted from the Huffman codefor JPEG (Huffman code used in the JPEG coding) included in JPEG codedimage data of the {fraction (1/16)} VGA size to the Huffman code forMPEG4 of the INTRA mode (Huffman code used in INTRA mode coding of theMPEG4 image coding). Here, for achieving step S341, the contents of thetable of the Huffman code for JPEG and the contents of the table of theHuffman code for MPEG4 should be known in advance, and these arepreviously prepared as in the case of the twelfth embodiment.

[0264] At step S342, the entire (string of Huffman codes for MPEG4included in) image area of the {fraction (1/16)} VGA size is insertedalong any MCU borderline (more generally partial block borderline) inthe QCIF size image area as shown in FIG. 38. Then, dummy data (stringof Huffman codes for MPEG4 having a predetermined value) is inserted inthe remaining image area, whereby incomplete (headerless) MPEG4 codeddata of the QCIF size is created (more specifically, strings of Huffmancodes for MPEG4 included in the QCIF size image area are created).

[0265] Steps S333 to S336 of FIG. 40 are same as those of FIG. 39, andtherefore the descriptions of these steps are not presented here.

[0266] <Fourteenth Embodiment>

[0267]FIG. 41 is a flowchart showing the procedure of the method ofconverting coded data by the server in which image data of the QVGAimage display size (lateral 320 pixels and longitudinal 240 pixels) isconverted to image data of the smaller QCIF image display size (lateral176 pixels and longitudinal 144 pixels) in the fourteenth embodiment.

[0268] The fourteenth embodiment is different from the tenth embodimentin that the image size is reduced by a factor of 2, and dummy data isinserted before conversion of the coding method.

[0269] At step S361 of FIG. 41, JPEG coded data of the QVGA size issubjected to JPEG image decoding to create QVGA size image data.

[0270] At step S362, the image is thinned out to scale down image databy a factor of 2 (any factor allowing the image size to be reduced to asize smaller than the QCIF image size is acceptable) in both lateral andlongitudinal directions, thereby creating {fraction (1/16)} VGA sizeimage data.

[0271]FIG. 42 shows a growth of block deformation occurring when theimage size is reduced by a factor of 2 in both lateral and longitudinaldirections. In this figure, block deformations caused by JPEG codingexist in the positions shown by solid lines (spaced by 8 pixels in widthand by 8 lines in height). When the image is scaled down by a factor of2 in both lateral and longitudinal directions, block deformations causedby JPEG coding are shifted to the positions shown by solid lines anddotted lines (spaced by 4 pixels in width and by 4 lines in height). Inaddition, when this image is subjected to MPEG4 image coding, a blockdeformation caused by MPEG4 image coding may be added in the positionshown by the solid line. That is, an additional block deformation occursin the position shown by the dotted line due to scaledown of the image.

[0272] Now, referring to FIG. 41 again, at step S363, each pixel locatedin proximity of the position shown by the dotted line (central axis ineach lateral and longitudinal direction of each block) is subjected tosmoothing processing for making unobtrusive the block deformationadditionally occurring in the position shown by the dotted line asdescribed with FIG. 42.

[0273] Then, at step S364, data of the entire image area of the{fraction (1/16)} VGA size is inserted along any MCU borderline (moregenerally partial block borderline) of the QCIF size image area as shownin FIG. 38, and dummy data (image data having a predetermined value) isinserted in the remaining part, whereby QCIF size image data is created.At step S365, QCIF size image data is subjected to MPEG4 image coding tocreate MPEG4 coded image data of the QCIF size.

[0274] In this way, the coding conversion processing of converting datafrom QVGA size JPEG image data to the QCIF size MPEG image is completed.

[0275] <Fifteenth Embodiment>

[0276]FIG. 43 is a flowchart showing the procedure of the method ofconverting coded data by the server in which image data of the {fraction(1/16)} VGA image display size (lateral 160 pixels and longitudinal 120pixels) is converted to image data of the larger QCIF image display size(lateral 176 pixels and longitudinal 144 pixels) in the fifteenthembodiment.

[0277] The fifteenth embodiment is different from the eleventhembodiment in that the image size is increased by a factor of 2, and theQCIF size image is clipped from the enlarged image before conversion ofthe coding method.

[0278] At step S351 of FIG. 43, JPEG coded data of the QVGA size issubjected to JPEG image decoding to create QVGA size image data.

[0279] At step S352, the image is subjected to interpolation processingto scale up image data by a factor of 2 (any factor allowing the size ofthe data to be increased to a size larger than the QCIF image displaysize is acceptable), thereby creating QVGA size image data.

[0280] At step S353, as shown in FIG. 36, the QCIF size image area isclipped from the created QVGA size image data along any MCU borderline(more generally partial block borderline), whereby QCIF size image datais obtained.

[0281] At step S354, the QCIF size image data is subjected to MPEG4image coding to create MPEG4 coded data of the QCIF size.

[0282] In this way, the coding conversion processing of converting datafrom {fraction (1/16)} VGA size JPEG image data to the QCIF size MPEGimage data is completed.

[0283] As described above, according to the tenth to fifteenthembodiments, measures are taken to prevent the block border line frombeing shifted before and after conversion wherever possible when codeddata is converted to a different format, thereby making it possible tosuppress degradation in image quality by block deformation caused by theconversion.

[0284] <Other Embodiment>

[0285] Further, the object of the present invention can also be achievedby providing a storage medium storing program codes for performing theaforesaid processes to a computer system or apparatus (e.g., a personalcomputer), reading the program codes, by a CPU or MPU of the computersystem or apparatus, from the storage medium, then executing theprogram.

[0286] In this case, the program codes read from the storage mediumrealize the functions according to the embodiments, and the storagemedium storing the program codes constitutes the invention.

[0287] Further, the storage medium, such as a floppy disk, a hard disk,an optical disk, a magneto-optical disk, CD-ROM, CD-R, a magnetic tape,a non-volatile type memory card, and ROM, and computer network, such asLAN (local area network) and WAN (wide area network) can be used forproviding the program codes.

[0288] Furthermore, besides aforesaid functions according to the aboveembodiments are realized by executing the program codes which are readby a computer, the present invention includes a case where an OS(operating system) or the like working on the computer performs a partor entire processes in accordance with designations of the program codesand realizes functions according to the above embodiments.

[0289] Furthermore, the present invention also includes a case where,after the program codes read from the storage medium are written in afunction expansion card which is inserted into the computer or in amemory provided in a function expansion unit which is connected to thecomputer, CPU or the like contained in the function expansion card orunit performs a part or entire process in accordance with designationsof the program codes and realizes functions of the above embodiments.

[0290] In a case where the present invention is applied to the aforesaidstorage medium, the storage medium stores program codes corresponding tothe flowcharts described in the embodiments.

[0291] The present invention is not limited to the above embodiments andvarious changes and modifications can be made within the spirit andscope of the present invention. Therefore to apprise the public of thescope of the present invention, the following claims are made.

What is claimed is:
 1. An image processing apparatus comprising: animage information reception unit adapted to receive image informationobtained by sensing from an image sensing apparatus capable ofdelivering the image information to a first image display apparatus; andan image information delivering unit adapted to convert said imageinformation received by said image information reception unit to aformat for a second image display apparatus different in type from saidfirst image display apparatus, and deliver the converted imageinformation to said second display apparatus.
 2. The image processingapparatus according to claim 1, wherein said image informationdelivering unit delivers another information different from said imageinformation to said second image display apparatus.
 3. The imageprocessing apparatus according to claim 2, comprising an imageprocessing unit adapted to superimpose said another information on saidimage information, wherein said image information delivering unitdelivers said image information with said another informationsuperimposed thereon by said image processing unit to said second imagedisplay apparatus.
 4. The image processing apparatus according to claim2, wherein said image information delivering unit delivers said anotherinformation to said second image display apparatus for a time periodover which the operation of said image sensing apparatus is controlled.5. The image processing apparatus according to claim 2, wherein saidimage information delivering unit delivers said another information tosaid second image display apparatus for a time period until the imageprocessing apparatus establishes connection with said image sensingapparatus.
 6. The image processing apparatus according to claim 1,further comprising a sound data sending unit adapted to convert responseinformation to control information for controlling an image sensingoperation from said image processing apparatus into sound data for saidsecond image display apparatus, and send the sound data to said secondimage display apparatus.
 7. The image processing apparatus according toclaim 2, further comprising a first information acquisition unit adaptedto acquire another information retrieved by an information storagedevice storing various kinds of information on the basis of controlinformation for controlling an image sensing operation from saidinformation storage device, wherein said image information deliveringunit delivers said another information obtained by said firstinformation acquisition unit to said second image display apparatus. 8.The image processing apparatus according to claim 7, wherein saidcontrol information includes at least information for controlling panand/or tilt angles in the image sensing operation of said image sensingapparatus.
 9. The image processing apparatus according to claim 7,wherein said control information includes at least information forcontrolling the zoom scaling factor in the image sensing operation ofsaid image sensing apparatus.
 10. The image processing apparatusaccording to claim 7, wherein said first information acquisition unitobtains said another information retrieved by said information storagedevice according to the current time.
 11. The image processing apparatusaccording to claim 7, wherein said first information acquisition unitobtains said another information retrieved by said information storagedevice according to said image sensing apparatus.
 12. The imageprocessing apparatus according to claim 2, comprising a secondinformation acquisition unit adapted to obtain information indicating acontrol state related to the image sensing operation of said imagesensing apparatus as said another information, wherein said informationdelivering unit delivers said another information obtained by saidsecond information acquisition unit to said second image displayapparatus.
 13. An image delivery system comprising: an image sensingapparatus capable of delivering image information obtained by imagesensing to a first image display apparatus; a second image displayapparatus different in type from said first image display apparatus,adapted to request said image information sensed by said image sensingapparatus; and an image processing apparatus adapted to obtain saidimage information from said image sensing apparatus by accepting therequest for said image information from said second image displayapparatus, and deliver said image information to said second imagedisplay apparatus.
 14. The image delivery system according to claim 13,wherein said image processing apparatus superimposes another informationon said image information, and delivers the same to said second imagedisplay apparatus.
 15. An image delivery system comprising: an imagesensing apparatus capable of delivering image information obtained byimage sensing to said first image display apparatus; a second imagedisplay apparatus different in type from said first image displayapparatus, adapted to send sound data indicating a request for saidimage information sensed by said image sensing apparatus; and an imageprocessing apparatus adapted to convert said sound data from said secondimage display apparatus into a control command for said image sensingapparatus, and send said control command to said image apparatus,wherein said image processing apparatus receives said image informationsent from said image sensing apparatus according to said controlcommand, and delivers said image information to said second imagedisplay apparatus.
 16. The image delivery system according to claim 15,wherein said image processing apparatus converts response information tosaid control command from said image sensing apparatus into sound datafor said second image display apparatus, and sends the sound data tosaid second image display apparatus.
 17. An image processing methodcomprising: receiving image information obtained by image sensing froman image sensing apparatus capable of delivering the image informationto a first image display apparatus; and delivering the received imageinformation to a second image display apparatus different in type fromsaid first image display apparatus.
 18. A computer program productcomprising a computer usable medium having computer readable programcode means embodied in said medium for carrying out the image processingmethod of claim
 17. 19. An information delivery apparatus comprising: animage data reception unit adapted to receive image data from a pluralityof image sending apparatuses capable of sending said image data; a sounddata reception unit adapted to receive sound data from a plurality ofsound sending apparatuses; a coding unit adapted to selectively combinesaid image data received by said image data reception unit with saidsound data received by said sound data reception unit, and code thecombined data as image data with sound; and a delivering unit adapted todeliver said image data with sound generated by said coding unit to areceiving apparatus.
 20. The information delivery apparatus according toclaim 19, further comprising an information storage unit adapted tostore at least predetermined one or more conditions and correspondencebetween said conditions and various kinds of sound data as information.21. The information delivery apparatus according to claim 20, whereinsaid predetermined one or more conditions includes at least one ofconditions related to the status of said image sending apparatus,conditions related to personal information of a user and conditionsrelated to time.
 22. The information delivery apparatus according toclaim 20, further comprising an alteration unit adapted to alter thecontents of information stored in said information storage unit uponrequest from said receiving apparatus.
 23. The information deliveryapparatus according to claim 19, wherein said image data reception unitand said sound data reception unit are apparatuses connected on anetwork independently of said information delivery apparatus.
 24. Aninformation delivery system comprising a plurality of image sendingapparatuses capable of sending image data, a plurality of sound sendingapparatuses capable of sending sound data, a plurality of informationdelivery apparatuses adapted to deliver information, and a receivingapparatus adapted to receive said information delivered from saidinformation delivery apparatus, wherein said information deliveryapparatus comprises: an image data reception unit adapted to receivesaid image data from said image sending apparatus; a sound datareception unit adapted to receive said sound data from said soundsending apparatus; a coding unit adapted to selectively combine saidimage data received by said image data reception unit with said sounddata received by said sound data reception unit, and code the combineddata as image data with sound; and a delivering unit adapted to deliversaid image data with sound generated by said coding unit, and saidreceiving apparatus receives as said information said image data withsound delivered from said delivering unit of said information deliveryapparatus.
 25. An information delivery method comprising: receivingimage data from a plurality of image sending apparatuses; receivingsound data from a plurality of sound sending apparatuses; selectivelycombining received image data and sound data and coding as image datawith sound; and delivering said coded image data with sound to areceiving apparatus.
 26. A computer program product comprising acomputer usable medium having computer readable program code meansembodied in said medium for carrying out the information delivery methodof claim
 25. 27. A conversion processing method in which image data of afirst size coded by a first coding method in which data is divided intofirst blocks and coded is converted into image data of a second sizecoded by a second coding method in which data is divided in secondblocks and coded, comprising: clipping image data equivalent to saidsecond size from image data of said first size along a borderline ofsaid first block, and coding the clipped image data equivalent to saidsecond size by the second coding method.
 28. The conversion processingmethod according to claim 27, wherein image data of said first size isdecoded by a decoding method corresponding to said first coding methodprior to said clipping.
 29. The conversion processing method accordingto claim 27, wherein image data is clipped with the image data coded bysaid first coding method at the time of said clipping, and the imagedata coded by said first coding method is converted into image data ofsaid second coding method at the time of said coding.
 30. The conversionprocessing method according to claim 27, wherein image data of saidfirst size is scaled up by a factor of n by carrying out interpolationprocessing if said first size is smaller than said second size, and theimage data equivalent to said second size is clipped from the image datascaled up by a factor of n along a borderline of said first block at thetime of said clipping.
 31. The conversion processing method according toclaim 27, further comprising: comparing image data of the previous framewith image data of the current frame to make an inter-frame predictionfor each block; and selecting, on the basis of said inter-frameprediction, whether to perform coding by using image data in the currentframe by said second coding method or to perform coding by usinginter-frame predicted deviation information by said second codingmethod.
 32. The conversion processing method according to claim 27,wherein said first coding method is the JPEG coding method, and saidsecond coding method is the MPEG coding metod.
 33. A conversionprocessing method in which image data of a first size coded by a firstcoding method in which data is divided into first blocks and coded isconverted into image data of a second size coded by a second codingmethod in which data is divided into second blocks and coded,comprising: generating image data equivalent to said second size byinserting predetermined data in the periphery of image data of saidfirst size in said second blocks; and coding the generated image dataequivalent to said second size by the second coding method.
 34. Theconversion processing method according to claim 33, wherein image dataof said first size is decoded by a decoding method corresponding to saidfirst coding method prior to said insertion.
 35. The conversionprocessing method according to claim 33, further comprising: scalingdown the image data of said first size by a factor of n by thinning-outthe image data when said first size is larger than said second size; andsmoothing pixels located in borderlines of blocks of said image datascaled down, wherein the image data subjected to said smoothing is usedat the generating of said image data.
 36. The conversion processingmethod according to claim 33, wherein image data is clipped with theimage data coded by said first coding method at the time of saidclipping, and the image data coded by said first coding method isconverted into image data of said second coding method at the time ofsaid coding.
 37. The conversion processing method according to claim 33,further comprising: making an inter-frame prediction for each block bycomparing image data of the previous frame with image data of thecurrent frame; and selecting, on the basis of said inter-frameprediction, whether to perform coding by using image data in the currentframe by said second coding method or to perform coding by usinginter-frame predicted deviation information by said second codingmethod.
 38. The conversion processing method according to claim 33,wherein said first coding method is the JPEG coding method, and saidsecond coding method is the MPEG coding method.
 39. A conversionprocessing method in which image data of a first size coded by a firstcoding method in which data is divided into first blocks and coded isconverted into image data of a second size coded by a second codingmethod in which data is divided into second blocks and coded,comprising: converting image data of said first size into image data ofsaid second coding method; and inserting predetermined data coded bysaid second coding method in the periphery of the converted image dataof said first size in said second blocks so that said second image sizeis obtained.
 40. The conversion processing method according to claim 39,wherein said first coding method is the JPEG coding method, and saidsecond coding method is the MPEG coding method.
 41. A conversionprocessing apparatus converting image data of a first size coded by afirst coding method in which data is divided into first blocks and codedinto image data of a second size coded by a second coding method inwhich data is divided into second blocks and coded, comprising: aclipping unit adapted to clip image data equivalent to said second sizefrom image data of said first size along the borderline of said firstblock; and a coding unit adapted to code the clipped image dataequivalent to said second size by the second coding method.
 42. Aconversion processing apparatus converting image data of a first sizecoded by a first coding method in which data is divided into firstblocks and coded into image data of a second size coded by a secondcoding method in which data is divided into second blocks and coded,comprising: a data generation unit adapted to insert predetermined datain the periphery of image data of said first size in second blocks sothat said second image size is obtained to generate image dataequivalent to said second size; and a coding unit adapted to code thegenerated image data equivalent to said second size by the second codingmethod.
 43. A conversion processing apparatus converting image data of afirst size coded by a first coding method in which data is divided intofirst blocks and coded into image data of a second size coded by asecond coding method in which data is divided into second blocks andcoded, comprising: a conversion unit adapted to convert image data ofsaid first size into image data of said second coding method; and aninsertion unit adapted to insert predetermined data coded by said secondcoding method in the periphery of the converted image data of said firstsize in said second blocks so that said second image size is obtained.44. A computer program product comprising a computer usable mediumhaving computer readable program code means embodied in said medium forcarrying out the conversion processing method of claim
 27. 45. Acomputer program product comprising a computer usable medium havingcomputer readable program code means embodied in said medium forcarrying out the conversion processing method of claim
 33. 46. Acomputer program product comprising a computer usable medium havingcomputer readable program code means embodied in said medium forcarrying out the conversion processing method of claim 39.